CIFAR-10 and CIFAR-100 Dataset in PyTorch

In the previous topic, we learn how to use the endless dataset to recognized number image. The endless dataset is an introductory dataset for deep learning because of its simplicity. The endless dataset is a hello world for deep learning.

The CIFAR 10(Canadian Institute for Advanced Research) will be harder to classify and will come with new barriers which we will need to overcome. It is a collection of the image which is commonly used to train machine learning and computer vision algorithms. The CIFAR 10 dataset contains 50000 training images and 10000 validation images such that the images can be classified between 10 different classes.

The CIFAR-10 dataset consists of 60000 thirty by thirty color images in 10 classes means 6000 images per class. This dataset is divided into one test batch and five training batches. Every batch contains 10000 images. In the test batch, there are 1000 images which are randomly selected from each class. The training batch contains remaining images in random order. Some of the training batches may contain more images from one class than another.

CIFAR-10 and CIFAR-100 Dataset in PyTorch

The classes will be completely mutually exclusive. There will be no overlapping between automobiles and trucks. Automobiles include things which are similar to sedans and SUVs. Trucks class includes only big trucks, and it neither includes pickup trucks. As opposed to the MNIST dataset, the objects within these classes are much more complex in nature and extremely varied. If we are looked through the CIFAR dataset, we realize that there is not just one type of bird or cat. The bird and cat class contains many different types of birds and cat varying in size, color, magnification, different angles, and different poses.

With the endless dataset, although there are many ways in which we can write the number one and number two. It just was not as varied, and on the top of that, the endless dataset is a gray scalar. The CIFAR dataset contains a larger 32 by 32 color images, and each image is with three different color channels. Now our biggest question is that the LeNet model which performed so well on the endless dataset will it be enough to classify CIFAR dataset?

CIFAR-100 Dataset

It is just like the CIFAR-10 dataset. The only difference is that it has 100 classes containing 600 images per class. There are 100 testing images and 500 training images per class. These 100 classes are grouped into 20 superclasses, and each image comes with a "coarse" label (the superclass to which it belongs) and a "fine" label (the class to which it belongs).

There are the following classes in the CIFAR-100 dataset:

S. No	Superclass	Classes
1.	aquatic mammals	beaver, dolphin, otter, seal, whale
2.	flowers	orchids, poppies, roses, sunflowers, tulips
3.	fish	aquarium fish, flatfish, ray, shark, trout
4.	food containers	bottles, bowls, cans, cups, plates
5.	household electrical devices	clock, computer keyboard, lamp, telephone, television
6.	fruit and vegetables	apples, mushrooms, oranges, pears, sweet peppers
7.	household furniture	bed, chair, couch, table, wardrobe
8.	large carnivores	bear, leopard, lion, tiger, wolf
9.	insects bee, beetle, butterfly, caterpillar, cockroach
10.	large man-made outdoor things	bridge, castle, house, road, skyscraper
11.	large natural outdoor scenes	cloud, forest, mountain, plain, sea
12.	medium-sized mammals	fox, porcupine, possum, raccoon, skunk
13.	large omnivores and herbivores	camel, cattle, chimpanzee, elephant, kangaroo
14.	non-insect invertebrates	crab, lobster, snail, spider, worm
15.	reptiles	crocodile, dinosaur, lizard, snake, turtle
16.	people	baby, boy, girl, man, woman
17.	trees	maple, oak, palm, pine, willow
18.	small mammals	hamster, mouse, rabbit, shrew, squirrel
19.	vehicles 1	bicycle, bus, motorcycle, pickup truck, train
20.	vehicles 2	lawn-mower, rocket, streetcar, tank, tractor