MNIST Dataset of Image Recognition in PyTorch

In this topic, we will discuss a new type of dataset which we will use in Image Recognition. This dataset is known as MNIST dataset. The MNIST dataset can be found online, and it is essentially just a database of various handwritten digits. The MNIST dataset has a large amount of data and is commonly used to demonstrate the true power of deep neural networks.

Suppose we have the following figure:

MNIST Dataset of Image Recognition in PyTorch

When we look at the image, our brain and eyes work together to recognize this image as number eight. Our brain is a very powerful tool, and it's capable of categorizing this image as an eight very quickly. There are so many shapes of a number, and our mind can easily recognize these shapes and determine what number is it, but this task is not so simple for a computer to complete. There is only one way to do this, which is the use of deep neural network which allows us to train a computer to classify the handwritten digits effectively.

So far, we have only dealt with data which contains simple data points on a Cartesian coordinate system. From starting till now, we have dealt with binary class datasets. Now, we will use the multiclass datasets, and when we use multiclass datasets, we will use the Softmax activation function in the output layer rather than the sigmoid function. The sigmoid activation function is quite useful for classifying binary datasets, and it was quite effective in arranging probability values between 0 and 1. The sigmoid function is not effective for multiclass datasets, and for this purpose, we use the Softmax activation function, which is capable of dealing with it.

The MNIST dataset is a multiclass dataset which consists of 10 classes into which we can classify numbers from 0 to 9. The major difference between the datasets which we have used previously and the MNIST dataset is the method in which the MNIST data is inputted into the neural network.

In the perceptron model and linear regression model, each data points were defined by simple X and Y coordinate. This means that the input layer needed two nodes for inputting single data points.

In MNIST dataset, a single data point comes in the form of an image. These images, contained in MNIST datasets, are typically 28*28 pixels such that 28 pixels traversing the horizontal axis and 28 pixels traversing the vertical axis. It means that a single image from the MNIST database has a total of 784 pixels which must be analyzed. There are 784 nodes in the input layer of our neural network to analyze one of these images.

Due to the additional input nodes and increased no of the classes that the numbers can be classified in 0 to 9. It is clear that our dataset is more complex than any of the datasets we analyze before. For classifying this dataset, a deep neural network is required with the effectiveness of some hidden layers.

In our deep neural network, there are 784 nodes in the input layer, a few of hidden layers which feed-forward the input values and finally ten nodes in output layer for each of the respective handwritten numbers. The values are fed through the network, and the node which outputs the highest activation value in the output layer identifies the letter or number.

Next TopicImage Transforms in Image Recognition

← prev next →