Data Augmentation Process

Previously, we saw a significant increment in model accuracy. Our model was effectively trained to classify the training data. It did not generalize well for the validation data to fix the overfishing issue. Now, let's discuss one more technique to improve the model training process. This technique is known as data augmentation. It is the process by which we create new data for our model to use during the training process.

This is done by taking our existing dataset and transforming or altering the image in useful ways to create new images.

After applying the transformation, the newly created images are known as augmented images because they essentially allow us to augment our dataset by adding new data to it. The data augmentation technique is useful because it allows our model to look at each image in our dataset from a variety of different perspective. This allows our model to extract relevant features more accurately and to obtain more feature-related data from each training image.

Now our biggest question is how we will use that augmentation to reduce overfitting. The overfitting occurs when our model is too closely fit the training set.

There is no need to start collecting new images and adding them to our datasets. We can use data augmentation which introduces minor alteration to our existing datasets such darker shading, flips, zooming, rotations or translation. Our model will interpret them as separate distinct images. It will not only reduce over fitting but it also prevents our network from learning irrelevant patterns and boosts overall performance. We have the following steps to perform data augmentation:

Step 1:

To perform data augmentation on training dataset, we have to make to make a separate transform statement. For validation dataset the transform will remain same. So we first copy our transform1 statement and treat it as transform_train as:

transform_train=transforms.Compose([transforms.Resize((32,32)),transforms.ToTensor(),transforms.Normalize((0.5,),(0.5,))])

Step 2:

Now, we will add alternation in our transform_train statement. The alternations will be a RandomHorizontalFlip, RandomRotation which is used for rotation of an image by a certain angle and that angle will be passes as an argument.

transform_train=transforms.Compose([transforms.Resize((32,32)),
		transform.RandomHorizontalFlip(),
		transform.RandomRotation(),
		transforms.ToTensor(),
		transforms.Normalize((0.5,),(0.5,))])

To add even more variety to our dataset, we will use a fine type transformation. Fine transformation represent simple transformation which preserve straight lines and planes with the object. Scaling, translation, shear and zooming is a transformation which fits this category.

transform_train=transforms.Compose([transforms.Resize((32,32)),
		transform.RandomHorizontalFlip(),
		transform.RandomRotation(),
		transform.RandomAffine(0,shear=10,scale=(0.8,1.2)),
		transforms.ToTensor(),
		transforms.Normalize((0.5,),(0.5,))])

In RandomAffine(), the first argument is decrease which we set zero to deactivate rotation, second argument is the shear transformation and the last one is the scaling transformation and use a topple to define the range of zoom which we have required. We defined a lower and upper limit of 0.8 and 1.2 to scale images to 80 or 120 percent of their size.

Step 3:

Now, we move onto our next augmentation to create new augmented images with a randomized variety of brightness, contrast and saturation. We will add another transformation i.e. ColorJitter as:

transform_train=transforms.Compose([transforms.Resize((32,32)),
		transform.RandomHorizontalFlip(),
		transform.RandomRotation(10),
		transform.RandomAffine(0,shear=10,scale=(0.8,1.2)),
		transform.ColorJitter(brightness=0.2,contrast=0.2,saturation=0.2)
		transforms.ToTensor(),
		transforms.Normalize((0.5,),(0.5,))])

Step 4:

Before executing our code, we have to change the training_dataset statement because now we have another transform for the training dataset. So

training_dataset=datasets.CIFAR10(root='./data',train=True,download=True,transform=transform_train)

Now, we will execute our code, and after execution, it will give us the expected output with a correct prediction.

Complete Code:

import torch
import matplotlib.pyplot as plt
import numpy as np
import torch.nn.functional as func
import PIL.ImageOps
from torch import nn
from torchvision import datasets,transforms 
import requests
from PIL import Image
device=torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
transform_train=transforms.Compose([transforms.Resize((32,32)),
                               transforms.RandomHorizontalFlip(),
                               transforms.RandomRotation(10),
                               transforms.RandomAffine(0,shear=10,scale=(0.8,1.2)),
                               transforms.ColorJitter(brightness=0.2,contrast=0.2,saturation=0.2),
                               transforms.ToTensor(),
                               transforms.Normalize((0.5,),(0.5,))])
transform1=transforms.Compose([transforms.Resize((32,32)),transforms.ToTensor(),transforms.Normalize((0.5,),(0.5,))])
training_dataset=datasets.CIFAR10(root='./data',train=True,download=True,transform=transform_train)
validation_dataset=datasets.CIFAR10(root='./data',train=False,download=True,transform=transform1)
training_loader=torch.utils.data.DataLoader(dataset=training_dataset,batch_size=100,shuffle=True)
validation_loader=torch.utils.data.DataLoader(dataset=validation_dataset,batch_size=100,shuffle=False)
def im_convert(tensor):
    image=tensor.cpu().clone().detach().numpy()
    image=image.transpose(1,2,0)
    print(image.shape)
    image=image*(np.array((0.5,0.5,0.5))+np.array((0.5,0.5,0.5)))
    image=image.clip(0,1)
    return image
classes=('plane','car','bird','cat','dear','dog','frog','horse','ship','truck')
dataiter=iter(training_loader)
images,labels=dataiter.next()
fig=plt.figure(figsize=(25,4))
for idx in np.arange(20):

    ax=fig.add_subplot(2,10,idx+1)
    plt.imshow(im_convert(images[idx]))
    ax.set_title(classes[labels[idx].item()])
class LeNet(nn.Module):
        def __init__(self):
            super().__init__()
            self.conv1=nn.Conv2d(3,16,3,1, padding=1)
            self.conv2=nn.Conv2d(16,32,3,1, padding=1)
            self.conv3=nn.Conv2d(32,64,3,1, padding=1)   
            self.fully1=nn.Linear(4*4*64,500)
            self.dropout1=nn.Dropout(0.5) 
            self.fully2=nn.Linear(500,10)
        def forward(self,x):
            x=func.relu(self.conv1(x))
            x=func.max_pool2d(x,2,2)
            x=func.relu(self.conv2(x))
            x=func.max_pool2d(x,2,2)
            x=func.relu(self.conv3(x))
            x=func.max_pool2d(x,2,2)
            x=x.view(-1,4*4*64)	#Reshaping the output into desired shape
            x=func.relu(self.fully1(x))	#Applying relu activation function to our first fully connected layer
            x=self.dropout1(x)
            x=self.fully2(x)	#We will not apply activation function here because we are dealing with multiclass dataset
            return x    
model=LeNet().to(device)
criteron=nn.CrossEntropyLoss()
optimizer=torch.optim.Adam(model.parameters(),lr=0.001)
epochs=12
loss_history=[]
correct_history=[]
val_loss_history=[]
val_correct_history=[]
for e in range(epochs):
    loss=0.0
    correct=0.0
    val_loss=0.0
    val_correct=0.0
    for input,labels in training_loader:
        input=input.to(device)
        labels=labels.to(device)
        outputs=model(input)
        loss1=criteron(outputs,labels)
        optimizer.zero_grad()
        loss1.backward()
        optimizer.step()
        _,preds=torch.max(outputs,1)
        loss+=loss1.item()
        correct+=torch.sum(preds==labels.data)
    else:
        with torch.no_grad():
            for val_input,val_labels in validation_loader:
                val_input=val_input.to(device)
                val_labels=val_labels.to(device)
                val_outputs=model(val_input)
                val_loss1=criteron(val_outputs,val_labels) 
                _,val_preds=torch.max(val_outputs,1)
                val_loss+=val_loss1.item()
                val_correct+=torch.sum(val_preds==val_labels.data)
        epoch_loss=loss/len(training_loader)
        epoch_acc=correct.float()/len(training_loader)
        loss_history.append(epoch_loss)
        correct_history.append(epoch_acc)
        val_epoch_loss=val_loss/len(validation_loader)
        val_epoch_acc=val_correct.float()/len(validation_loader)
        val_loss_history.append(val_epoch_loss)
        val_correct_history.append(val_epoch_acc)
        print('training_loss:{:.4f},{:.4f}'.format(epoch_loss,epoch_acc.item()))
        print('validation_loss:{:.4f},{:.4f}'.format(val_epoch_loss,val_epoch_acc.item()))

url='https://akm-img-a-in.tosshub.com/indiatoday/images/story/201810/white_stork.jpeg?B2LINO47jclcIb3QCW.Bj9nto934Lox4'
response=requests.get(url,stream=True)
img=Image.open(response.raw)
img=transform1(img)   
image1=img.to(device).unsqueeze(0)
output=model(image1)
_,pred=torch.max(output,1)
print(classes[pred.item()])

dataiter=iter(validation_loader)  
images,labels=dataiter.next()  
images_=images.to(device)  
labels=labels.to(device)  
output=model(images_)  
_,preds=torch.max(output,1)  
fig=plt.figure(figsize=(25,4))  
for idx in np.arange(20):  
      ax=fig.add_subplot(2,10,idx+1,xticks=[],yticks=[])   
      plt.imshow(im_convert(images[idx]))    
ax.set_title("{}({})".format(str(classes[preds[idx].item()]),str(classes[labels[idx].item()]),color=("green" if classes[preds[idx]]==classes[labels[idx]] else "red")))
plt.show()