Image Recognition

Image recognition is a process of extracting meaningful information, such as the content of an image, from a given image. In image recognition, it is essential to classify the major content in a given image, so it does not involve determining the position and pose of the recognized content.

The term "Image Recognition" is introduced for computer technologies which recognize the certain animal, objects, people, or other targeted subjects with the help of algorithms and machine learning concepts. Image recognition is connected to computer vision, which is a comprehensive label to see like humans for the process of training computers and image processing. It is a catch-all term for computers which do intensive work on data.

There are several ways to do image recognition. The use of a convolutional neural network lies on the top of many recognition techniques, and it filters images through a sequence of artificial neuron layer. The convolutional neural network was specially designed for image recognition and similar image processing. With the help of the combination of techniques such as max pooling, padding and stride configuration, CNN filters work on images to help machine learning programs get better at identifying the subject of the picture.

Challenges of Image Recognition

Image recognition is one of the techniques which is widely used in the present era. Because of its popularity and its continuous use, it faces many challenging problems. These problems are as follows:

1) Distortion

Objects do not change even if they are distorted. The system learns from the original image and forms a perception that this object can be in specific shape only. In the real world, shape changes, and as a result, there are inaccuracies occur when a system encounters a distort image of an object.

2) Inter-class variation

Certain object change within the class. They can be of different size, shape, but still, they represent the same class. For example, bottles, button, bags, chairs come different size and appearances.

3) Viewpoint variation

When images (in which the entities are aligned in a different direction) are fed to the system, it predicts inaccurate values. The system is unable to understand that changing the alignment of the image such as left, right, bottom and top, will not make it different and that's because of it creates challenges in image recognition.

4) Scale variation

The classification of the object is affected if there is a variation in the size of the object. As closer we view the object, the bigger it looks in size and vice-versa.

5) Occlusion

Certain objects prevent the full view of an image and result in incomplete information being given to the system. It is necessary to develop an algorithm which is sensitive to these variations and contains a wide range of samples of the data.

Image classification in PyTorch

PyTorch is one of the most popular frameworks of Deep learning. Image classification is a supervised learning problem. Image classification is done with the help of a pre-trained model.

1) Pre-trained model

Pre-trained models are neural network models which are trained on large benchmark datasets like ImageNet. There is various pre-trained model such as AlexNet and ResNet101. Both the model has been trained on ImageNet dataset. The word pre-trained means that the deep learning architectures ResNet101 and AlexNet, for instance, have been already trained on some datasets and carry the resultant weights and biases with them. TorchVision has both the architectures and the pre-trained models.

a) Model Inference process

How to use the pre-trained model for predicting the class of input. There is a process involved in this which is referred to as Model Inference. This process has the following step:

Reading the input image.
Performing transformation on the image.
Forward pass
Displaying the predictions based on the obtained scores.

b) Loading Pre-trained network using TorchVision

We can easily use the pre-trained model with the help of TorchVision module. For this, we have first to install the torchvision and import models from torchvision module and with the help of dir (models) see the different models and architectures available with us.

pip install torchvision
from torchvision import models
dir(models) 

c) Using AlexNet for image classification

We have the following steps which are used to perform when we perform image classification using AlexNet:

Step1: Load the pre-trained model
Step2: Specify image transformation
Step3: Load the input image and pre-process it
Step4: Model Inference

d) Using ResNet for image classification

We have the following steps which are used to perform when we perform image classification using AlexNet:

Step1: Load the pre-trained model.
Step2: Put the model in eval mode.
Step3: Carryout model inferences.
Step4: print the top 5 classes predicted by the model.

In the next topic, we will discuss the MNIST dataset and how we can use a deep neural network to have a model fit image data. We will talk about the validation set, which is used to validate a neural network and check to see how well it generalizes to new data. After training an optimal neural network, we then use it to predict a new image from the web.

Next TopicMNIST Dataset of Image Recognition

← prev next →