Single Shot MultiBox Detector (SSD) using Neural Networking Approach

Introduction

A cutting-edge deep learning model called the Single Shot MultiBox Detector (SSD) is utilized for real-time item detection in pictures and movies. Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C. Berg first discussed it in their 2016 work titled "SSD: Single Shot MultiBox Detector."

Introduction to Object Detection

The computer vision problem of object detection is locating and identifying things within a picture or video stream. It can be used for many things, including robotics, surveillance, and autonomous driving. In the past, localization and classification of objects were accomplished through distinct processing steps and hand-crafted features. These methods, however, were frequently computationally expensive and inaccurate. By using a unified strategy, SSD addresses the shortcomings of conventional object detection techniques. SSD carries out object localization and classification concurrently in a single neural network instead of dividing them into separate steps. This improves detection accuracy while also reducing computational complexity.

Key Components of SSD

1. Base Convolutional Network

A base convolutional network, often based on architectures like VGG16 or ResNet, is at the heart of SSD. This network performs the feature extractor function, extracting hierarchical characteristics from the input image. These characteristics are necessary for identifying objects with various scales & aspect ratios.

2. Multi-scale Feature Maps

SSD extracts feature maps at various scales using a base convolutional network with several layers. These feature maps are essential for identifying objects of varied sizes because they preserve spatial information. SSD next applies several convolutional neural networks to these feature maps to forecast bounding boxes and class scores.

3. Anchor Boxes

SSD uses anchor boxes to handle objects of different sizes and aspect ratios. Anchor boxes are preconfigured boxes positioned at various points on the feature maps and come in various sizes and aspect ratios. The network's predicted offsets adjust each anchor box's position and size to match the items in the image.

4. Predictions for Object Localization and Classification

SSD produces two different types of predictions for each anchor box:

Bounding Box Offsets: These forecasts show how much the anchor box must be modified to match the target object precisely. For each anchor box, SSD predicts four values: the offsets for the box's top, left, bottom, and right sides.

Class Scores: To identify the category of an object, SSD forecasts class scores for each anchor box. These scores are calculated for all potential object classes, and a softmax activation function represents the class probabilities.

5. Non-Maximum Suppression and Confidence Score Thresholding

SSD employs confidence score thresholding to eliminate low-confidence detections after gathering predictions from numerous feature maps and anchor boxes. Additionally, it uses a method known as non-maximum suppression (NMS) to eliminate overlapping and duplicate bounding boxes, keeping only the most certain and precise detections.

Training SSD

Creating training data and network optimization are the two primary parts of training SSD.

1. Producing Training Data

Labelled training data, such as images with annotated bounding boxes and class labels, are necessary to train SSD. The loss during training is calculated using these annotations, encouraging the network to improve at making predictions over time.

3. Loss of Function

SSDs employ a loss function that incorporates two elements:

Measures the precision of bounding box forecasts compared to the actual ground truth bounding boxes.
Measures the precision of each anchor box's class predictions or classification loss. The sum of these two elements, weighted to account for the contributions of localization and classification, represents the overall loss.

3. Network optimization

SSD optimizes the network's weights using gradient descent and backpropagation methods. Adam or stochastic gradient descent (SGD), two popular optimisation techniques, are used to reduce the loss function and boost the effectiveness of the network.

Advantages

Real-Time Object Identification: SSD has real-time object identification capabilities, making it appropriate for applications where low latency is important, such as autonomous vehicles.
High Accuracy: By combining localization and classification optimizations, the unified design of SSD increases detection accuracy compared to conventional techniques.
Scale and Aspect Ratio Invariance: SSD efficiently handles objects with varying scales and aspect ratios because of its multi-scale feature maps & anchor boxes.
Ease of Training: SSD combines localization and categorization into a single network, making training easier than earlier techniques.
Low Memory Footprint: SSD can be used on devices with limited resources because of its single-pass architecture, eliminating the need to store intermediate feature maps.
Object tracking: SSD can be expanded by including temporal information to track objects between frames in video sequences, a useful feature in robotics and surveillance applications.
Robustness to Occlusions: SSD is robust when objects may be veiled because of the anchor box method, which keeps the detection rate even when objects are partially occluded.
Robustness to Image Variations: SSD's anchor boxes and multi-scale feature maps let it adapt to changes in illumination, perspective, and object positions, which improves its performance in difficult real-world situations.
Challenges in Object Detection: SSD has been a benchmark in object detection challenges, encouraging innovation and acting as a standard to compare the effectiveness of more recent object detection models.
Community and Support: SSD has attracted a thriving community of researchers and programmers working to improve and expand the framework and ensure its applicability to current computer vision workloads.

Code

import pandas as pd
import numpy as np
import cv2
import json
import os
import matplotlib.pyplot as plt
import random
import seaborn as sns
from keras.models import Sequential
from keras import optimizers
from keras import backend as K
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Conv2D, MaxPooling2D, BatchNormalization
from sklearn.model_selection import train_test_split
from keras.preprocessing.image import ImageDataGenerator
directory = "/input/face-mask-detection-dataset /annotations"
image_directory = "/input/face-mask-detection-dataset "
df = pd.read_csv("/input/face-mask-detection-dataset/train.csv")
df_test = pd.read_csv("/input/face-mask-detection-dataset/submission.csv")
def getJSON(filePathandName):
    with open(filePathandName,'r') as f:
        return json.load(f)
def adjust_gamma(image, gamma=1.0):
    invGamma = 1.0 / gamma
    table = np.array([((i / 255.0) ** invGamma) * 255 for i in np.arange(0, 256)])
    return cv2.LUT(image.astype(np.uint8), table.astype(np.uint8))
jsonfiles= []
for i in os.listdir(directory):
    jsonfiles.append(getJSON(os.path.join(directory,i)))
jsonfiles[0]
df = pd.read_csv("../input/face-mask-detection-dataset/train.csv")
df.head()
data = []
img_size = 124
mask = ['face_with_mask']
non_mask = ["face_no_mask"]
labels={'mask':0,'without mask':1}
for i in df["name"].unique():
    f = i+".json"
    for j in getJSON(os.path.join(directory,f)).get("Annotations"):
        if j["classname"] in mask:
            x,y,w,h = j["BoundingBox"]
            img = cv2.imread(os.path.join(image_directory,i),1)
            img = img[y:h,x:w]
            img = cv2.resize(img,(img_size,img_size))
            data.append([img,labels["mask"]])
        if j["classname"] in non_mask:
            x,y,w,h = j["BoundingBox"]
            img = cv2.imread(os.path.join(image_directory,i),1)
            img = img[y:h,x:w]
            img = cv2.resize(img,(img_size,img_size))    
            data.append([img,labels["without mask"]])
random.shuffle(data)
len(data)
p = []
for face in data:
    if(face[1] == 0):
        p.append("Mask")
    else:
        p.append("No Mask")
sns.countplot(p)
X = []
Y = []
for features,label in data:
    X.append(features)
    Y.append(label)
X[0].shape
X = np.array(X)/255.0
X = X.reshape(-1,124,124,3)
Y = np.array(Y)
np.unique(Y)
Y.shape
model = Sequential()
model.add(Conv2D(32, (3, 3), padding = "same", activation='relu', input_shape=(124,124,3)))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(Conv2D(128, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dropout(0.5))
model.add(Dense(50, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))
model.summary()
model.compile(loss='binary_crossentropy', optimizer='adam' ,metrics=['accuracy'])
xtrain,xval,ytrain,yval=train_test_split(X, Y,train_size=0.8,random_state=0)
datagen = ImageDataGenerator(
        featurewise_center=False,  
        samplewise_center=False,  
        featurewise_std_normalization=False,  
        samplewise_std_normalization=False,  
        zca_whitening=False,    
        rotation_range=15,    
        width_shift_range=0.1,
        height_shift_range=0.1,  
        horizontal_flip=True,  
        vertical_flip=False)
datagen.fit(xtrain)
history = model.fit_generator(datagen.flow(xtrain, ytrain, batch_size=32),
                    steps_per_epoch=xtrain.shape[0]//32,
                    epochs=20,
                    verbose=1,
                    validation_data=(xval, yval))
print(len(df_test["name"]),len(df_test["name"].unique()))
test_images = [ '0072.jpg','0353.jpg']
gamma = 2.0
fig = plt.figure(figsize = (14,14))
rows = 3
cols = 2
axes = []
assign = {'0':'Mask','1':"No Mask"}
for j,im in enumerate(test_images):
    image =  cv2.imread(os.path.join(image_directory,im),1)
    image =  adjust_gamma(image, gamma=gamma)
    (h, w) = image.shape[:2]
    blob = cv2.dnn.blobFromImage(cv2.resize(image, (300,300)), 1.0, (300, 300), (104.0, 177.0, 123.0))
    cvNet.setInput(blob)
    detections = cvNet.forward()
    for i in range(0, detections.shape[2]):
        try:
            box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])
            (startX, startY, endX, endY) = box.astype("int")
            frame = image[startY:endY, startX:endX]
            confidence = detections[0, 0, i, 2]
            if confidence > 0.2:
                im = cv2.resize(frame,(img_size,img_size))
im = np.array(im)/255.0
                im = im.reshape(1,124,124,3)
                result = model.predict(im)
                if result>0.5:
                    label_Y = 1
                else:
                    label_Y = 0
                cv2.rectangle(image, (startX, startY), (endX, endY), (0, 0, 255), 2)
                cv2.putText(image,assign[str(label_Y)] , (startX, startY-10), cv2.FONT_HERSHEY_SIMPLEX, 1.5, (36,255,12), 2)
        except:pass
    axes.append(fig.add_subplot(rows, cols, j+1))
    plt.imshow(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))
plt.show()

Output:

Single Shot MultiBox Detector (SSD) using Neural Networking Approach

Conclusion

One effective and strong deep learning-based method for object detection in pictures and videos is the Single Shot MultiBox Detector (SSD). SSD is appropriate for various applications, including autonomous cars, tracking, medical imaging, retail, and robotics. SSD can recognize and localize several objects inside a single frame. To detect and categorize items of interest reliably, SSD's key elements-feature extraction, anchor boxes, categorization, and bounding box regression-all work together. Real-time performance, handling objects of various scales, and working with various datasets are just a few of SSD's noteworthy benefits. Due to its precision and speed, it has been widely used in computer vision tasks.

Additionally, object detection is a crucial computer vision problem that goes beyond item classification by offering precise information about object placement. It has applications in many different fields, and as deep learning technology has developed, object identification techniques like SSD have also advanced, allowing for more precise and effective answers to practical issues. Object detection, including specialized applications like mask detection employing SSD, is crucial in enabling machines to see and interact with the world, whether for self-driving cars, medical diagnosis, security, retail, or any other domain.

Next TopicStepwise Predictive Analysis in Machine Learning

← prev next →