## VGGNet-16 ArchitectureVGG's full name is the Visual Geometry Group, and it is part of Oxford University's Department of Science and Engineering. It has developed a series of convolutional network models, beginning with VGG and progressing through VGG16 to VGG19, that may be used for face recognition and picture categorization. The primary goal of VGG's study on the depth of convolutional networks was to understand how the depth of convolutional networks influences the accuracy and precision of large-scale image recognition. -Deep-16 CNNTo increase the number of network layers while avoiding too many parameters, a tiny 3x3 convolution kernel is employed in each layer. ## StructureVGG's input is set to an RGB picture with dimensions of 224x244. The average RGB value is obtained for all photos in the training set, and the image is sent into the VGG convolutional network. A 3x3 or 1x1 filter is utilized, with the convolution step fixed. There are three VGG completely connected layers, which can range from VGG11 to VGG19 depending on the total number of convolutional and fully connected layers. The minimal VGG11 contains eight convolutional layers and three fully linked layers. The maximal VGG19 contains 16 convolutional layers. +3 totally linked layers. Furthermore, the VGG network does not include a pooling layer beneath each convolutional layer, for a total of five pooling layers scattered across multiple convolutional layers. The following diagram depicts the VGG structure: VGG16 has 16 layers, whereas VGG19 has 19 layers. The last three completely linked layers include a succession of identical VGGs. The overall structure consists of five sets of convolutional layers, followed by a MaxPool. The distinction is that the five sets of convolutional layers contain an increasing number of cascaded convolutional layers. AlexNet's convolutional layers each include only one convolution, with a kernel size of 7 7. Each convolution layer in VGGNet has between two and four convolution operations. The convolution kernel is 3 * 3, the convolution step size is 1, and the pooling kernel is 2 * 2, with a step size of 2. The most apparent enhancement to VGGNet is to decrease the size of the convolution kernel while increasing the number of convolution layers. Using several convolution layers with smaller convolution kernels rather than a bigger convolution layer with convolution kernels can minimize parameters on the one hand, and the author feels it is comparable to more non-linear mapping, which improves Fit expression ability. Two successive 3 3 convolutions equal a 5 5 receptive field, whereas three equal a 7 7. The benefits of utilizing three 3 3 convolutions instead of one 7 7 convolutions are twofold: first, by incorporating three ReLu layers instead of one, the decision function becomes more discriminative; second, parameters are reduced. For example, all inputs and outputs are C channels. Three convolutional layers using 3 3 require three (3 3 C C) = 27 C C, whereas one convolutional layer using 7 7 requires seven (7 7 C C) = 49 C. This may be seen as regularising the 7 7 convolutions, resulting in three 3 3 convolutions. The purpose of the 1 1 convolution layer is to increase the non-linearity of the decision function while preserving the convolution layer's receptive field. Although the 1 1 convolution technique is linear, ReLu introduces nonlinearity. ## Network ConfigurationThis image is undoubtedly used to introduce VGG16. This image holds a wealth of information. - This is a comparison table of six networks. The network is becoming more complex as it moves from A to E. Several layers were applied to test the effect.
- Each column provides a detailed explanation of each network's structure.
- This is the proper approach to do experiments: utilize the simplest strategy to address the problem, and then gradually optimize for any difficulties that arise.
## Training- The optimization technique is stochastic gradient descent (SGD) plus momentum (0.9). The batch size is 256.
- L2 regularisation is applied, with a weight decay of 5e-4. Dropout occurs after the first two fully linked layers (p = 0.5).
- Although it is deeper and has more parameters than the AlexNet network, we hypothesize that VGGNet can converge in fewer cycles for two reasons: one, the larger depth and smaller convolutions bring implicit regularization; Second, certain layers of pre-training.
- For a shallow A network, the parameters are randomly initialized, the weight w is drawn from N (0, 0.01), and the bias is set to zero. Then, for deeper networks, the first four convolutional layers and three fully connected layers are initialized with the A network parameters. However, it was later revealed that it may be immediately initialized without the need for pre-trained parameters.
- To obtain a 224 by 224 input picture, each rescaled image is randomly cropped in each SGD cycle. To enrich the data set, the clipped picture is randomly flipped horizontally and the RGB color is altered.
We will now use the VGGNet-16 model for the identification of the object in an image.
## Importing LibrariesWe will then define the size of the input. ## VGG-16 ModelWe will then lay out the VGG-16 model for use.We will use a pretrained model as we want to make this process easy to understand. ## Pretrained ModelThe Keras library also includes a pre-trained model that allows you to import stored model weights and utilize them for a variety of tasks, including transfer learning, picture feature extraction, and object identification. We may load the model architecture provided in the library and then assign all of the weights to the appropriate layers. Before we use the pretrained models, let's construct a few functions for making predictions. First, load photos and preprocess them. We will be using pretrained weights to save time and to make this process easy.
Here you can see that it is providing elements that are provided in the images and if you look closely it is doing its work pretty well. Next TopicIndependent Component Analysis |