OpenCV (Computer Vision Library) Using Python
OpenCV tutorial provides basic and advanced concepts of OpenCV. Our OpenCV tutorial is designed for beginners and professionals.
What is OpenCV?
OpenCV is a Python open-source library, which is used for computer vision in Artificial intelligence, Machine Learning, face recognition, etc.
In OpenCV, the CV is an abbreviation form of a computer vision, which is defined as a field of study that helps computers to understand the content of the digital images such as photographs and videos.
The purpose of computer vision is to understand the content of the images. It extracts a description from the pictures, which may be an object, a text description, and three-dimension model, and so on. For example, cars can be facilitated with computer vision, which will be able to identify and different objects around the road, such as traffic lights, pedestrians, traffic signs, and so on, and acts accordingly.
Computer vision allows the computer to perform the same kind of tasks as humans with the same efficiency. There are a two main task which are defined below:
OpenCV stands for Open Source Computer Vision Library, which is widely used for image recognition or identification. It was officially launched in 1999 by Intel. It was written in C/C++ in the early stage, but now it is commonly used in Python for the computer vision as well.
The first alpha version of OpenCV was released for the common use at the IEEE Conference on Computer Vision and Pattern Recognition in 2000, and between 2001 and 2005, five betas were released. The first 1.0 version was released in 2006.
The second version of the OpenCV was released in October 2009 with the significant changes. The second version contains a major change to the C++ interface, aiming at easier, more type-safe, pattern, and better implementations. Currently, the development is done by an independent Russian team and releases its newer version in every six months.
Installation of the OpenCV
Install OpenCV using Anaconda
The first step is to download the latest Anaconda graphic installer for Windows from it official site. Choose your bit graphical installer. You are suggested to install 3.7 working with Python 3.
Choose the graphical bit installer
After installing it, open the Anaconda prompt and type the following command.
Press the Enter button and it will download all the related OpenCV configuration.
Install OpenCV in the Windows via pip
OpenCV is a Python library so it is necessary to install Python in the system and install OpenCV using pip command:
We can install it without extra modules by the following command:
Open the command prompt and type the following code to check if the OpenCV is installed or not.
Why OpenCV is used for Computer Vision?
How does computer recognize the image?
Human eyes provide lots of information based on what they see. Machines are facilitated with seeing everything, convert the vision into numbers and store in the memory. Here the question arises how computer convert images into numbers. So the answer is that the pixel value is used to convert images into numbers. A pixel is the smallest unit of a digital image or graphics that can be displayed and represented on a digital display device.
The picture intensity at the particular location is represented by the numbers. In the above image, we have shown the pixel values for a grayscale image consist of only one value, the intensity of the black color at that location.
There are two common ways to identify the images:
Grayscale images are those images which contain only two colors black and white. The contrast measurement of intensity is black treated as the weakest intensity, and white as the strongest intensity. When we use the grayscale image, the computer assigns each pixel value based on its level of darkness.
An RGB is a combination of the red, green, blue color which together makes a new color. The computer retrieves that value from each pixel and puts the results in an array to be interpreted.
The cvtColor is used to convert an image from one color space to another. The syntax is following:
src - It is used to input an image: 8-bit unsigned.
dst - It is used to display an image as output. The output image will be same size and depth as input image.
code - color space conversion code.
OpenCV Reading Images
OpenCV allows us to perform multiple operations on the image, but to do that it is necessary to read an image file as input, and then we can perform the various operations on it. OpenCV provides following functions which are used to read and write the images.
OpenCV imread function
The imread() function loads image from the specified file and returns it. The syntax is:
filename: Name of the file to be loaded
flag: The flag specifies the color type of a loaded image:
The imread() function returns a matrix, if the image cannot be read because of unsupported file format, missing file, unsupported or invalid format. Currently, the following file formats are supported.
Window bitmaps - *.bmp, *.dib
Note: The color images, the decoded images will have the channels stored in the BGR order.
Let's consider the following example:
Output: it will display the following image.
OpenCV Writing Images
OpenCV imwrite() function is used to save an image to a specified file. The file extension defines the image format. The syntax is the following:
filename- Name of the file to be loaded
image- Image to be saved.
params- The following parameters are currently supported:
Let's consider the following example:
Image written to file-system : True
If the imwrite() function returns the True, which means the file is successfully written in the specified file.
OpenCV Resize the image
Sometimes, it is necessary to transform the loaded image. In the image processing, we need to resize the image to perform the particular operation. Images are generally stored in Numpy ndarray(array). The ndarray.shape is used to obtain the dimension of the image. We can get the width, height, and numbers of the channels for each pixel by using the index of the dimension variable.
Resized Dimensions : (199, 300, 3)
The resizing of image means changing the dimension of the image, its width or height as well as both. Also the aspect ratio of the original image could be retained by resizing an image. OpenCV provides cv2.resize() function to resize the image. The syntax is given as:
Example of resizing the images
There are several ways to resize the image. Below are some examples to perform resize operation:
Retain the aspect ratio
Original Dimensions : (332, 500, 3) Resized Dimensions : (199, 300, 3)
In the above example, the scale_per variable holds the percentage of the image which needs to be scaled. The value<100 is used to downscale the provided image. We will use this scale_per value along with the original image's dimension to calculate the width and height of the output image.
Upscale with resize()
Original Dimensions : (332, 500, 3) Resized Dimensions : (398, 600, 3)
Not retaining the aspect ratio
In the below example, we have provided a specific value in pixel for width and the height will remain unaffected.
Original Dimensions : (332, 500, 3) Resized Dimensions : (440, 500, 3)
In the below example, the scale_per value holds the percentage by which height has to be scaled or we can provide the specific value in pixels.
Original Dimensions : (332, 500, 3) Resized Dimensions : (200, 500, 3)
Resize the specific width and height
OpenCV Image Rotation
The image can be rotated in various angles (90,180,270 and 360). OpenCV calculates the affine matrix that performs affine transformation, which means it does not preserve the angle between the lines or distances between the points, although it preserves the ratio of distances between points lying on the lines.
The syntax of the rotate image is the following:
OpenCV Gaussian Blur (Image Smoothing)
Image smoothing is a technique which helps in reducing the noise in the images. Image may contain various type of noise because of camera sensor. It basically eliminates the high frequency (noise, edge) content from the image so edges are slightly blurred in this operation. OpenCV provide gaussianblur() function to apply smoothing on the images. The syntax is following:
borderType - These are the specified image boundaries while kernel is applied on the image borders. Possible border type are:
OpenCV Blob Detection
Blob stands for Binary Large Object and refers to the connected pixel in the binary image. The term "Large" focuses on the object of a specific size, and that other "small" binary objects are usually noise. There are three processes regarding BLOB analysis.
Blob extraction means to separate the BLOBs (objects) in a binary image. A BLOB contains a group of connected pixels. We can determine whether two pixels are connected or not by the connectivity, i.e., which pixels is neighbor of another pixel. There are two types of connectivity. The 8-connectivity and the 4-connectivity. The 8-connectivity is far better than 4-connectivity.
BLOB representation is simply means that convert the BLOB into a few representative numbers. After the BLOB extraction, the next step is to classify the several BLOBs. There are two steps in the BLOB representation process. In the first step, each BLOB is denoted by several characteristics, and the second step is to apply some matching methods that compare the features of each BLOB.
Here we determine the type of BLOB, for example, given BLOB is a circle or not. Here the question is how to define which BLOBs are circle and which are not based on their features that we described earlier. For this purpose, generally we need to make a prototype model of the object we are looking for.
How to perform Background Subtraction?
Background subtraction is widely used to generating a foreground mask. The binary images contain the pixels which belong to moving objects in the scene. Background subtraction calculates the foreground mask and performs the subtraction between the current frame and background model.
There are two main steps in Background modeling
Manual subtraction from the first frame
First, we import the libraries and load the video. Next, we take the first frame of the video, convert it into grayscale, and apply the Gaussian Blur to remove some noise. We use the while loop, so we load frame one by one. After doing this, we get the core part of the background of the subtraction where we calculate the absolute difference between the first frame and the current frame.
Subtraction using Subtractor MOG2
OpenCV provides the subtractor MOG2 which is effective than the manual mode. The Subtractor MOG2 has the benefit of working with the frame history. The syntax is as follows:
The first argument, history is the number of the last frame(by default 120).
The second argument, a varThreshold is the value that used when evaluating the difference to extract the background. A lower threshold will find more variation with the advantage of a noisier image.
The third argument, detectShadows is the functions of the algorithm which can remove the shadow if enabled.
OpenCV Image Threshold
The basic concept of the threshold is that more simplify the visual data for analysis. When we convert the image into gray-scale, we have to remember that grayscale still has at least 255 values. The threshold is converted everything to white or black, based on the threshold value. Let's assume we want the threshold to be 125(out of 255), then everything that was under the 125 would be converted to 0 or black, and everything above the 125 would be converted to 255, or white. The syntax is as follows:
src: Source image, it should be a grayscale image.
thresh: It is used to classify the pixel value.
maxVal: It represents the value to be given if the pixel threshold value.
OpenCV provides different styles of threshold that is used as fourth parameter of the function. These are the following:
Let's take a sample input image
We have taken above image as an input. We describe how threshold actually works. The above image is slightly dim and little bit hard to read. Some parts are light enough to read, while other part is required more focus to read properly.
Let's consider the following example:
OpenCV Edge detection
Edge detection is term where identify the boundary of object in image. We will learn about the edge detection using the canny edge detection technique. The syntax is canny edge detection function is given as:
Example: Real Time Edge detection
Contours are defined as a curve joining all the continuous points (along the boundary), having the same color or intensity. In the other, we find counter in a binary image, we focus to find the boundary in the binary image. The official definition is following:
The Contours are the useful tool for shape analysis and object detection and recognition.
To maintain accuracy, we should use the binary images. First, we apply the threshold or canny edge detection.
In OpenCV, finding the contour in the binary image is the same as finding white object from a black background.
OpenCV provides findContours(), which is used to find the contour in the binary image. The syntax is following:
The findContours () accepts the three argument first argument is source image, second is contour retrieval mode, and the third is contours approximation.
Let's consider the following example:
How to draw the Contours?
OpenCV provides the cv2.drawContours() function, which is used to draw the contours. It is also used to draw any shape by providing its boundary points. Syntax of cv2.drawContours() function is given below:
To draw all the contours in an image:
To draw an individual contour, suppose 3rd counter
The first argument represents the image source, second argument represents the contours which should be passed as a Python list, the third argument is used as index of Contours, and other arguments are used for color thickness.
Contour Approximation Method
It is the third argument in the cv2.findCounter(). Above, we have described it to draw the boundary of the shape with same intensity. It stores the (x,y) coordinates of the boundary of a shape. But here the question arise does it store all the coordinates? That is specified by the contour approximation method.
If we pass the cv.CHAIN_APPROX_NONE, it will store all the boundary points. Sometimes it does not need to store all the points coordinate, suppose we found the contours of a straight line where it does not require to store all the contour points, it requires only two endpoints to store. So for such case, we use cv.CHAIN_APPROX_NONE, it removes all redundant points and compresses the contours, thereby saving memory.
In the above image of rectangle, the first image shows points using with cv.CHAIN_APPROX_NONE(734) and the second image shows the one with cv2.CHAIN_APPROX_SIMPLE(only 4 points). We can see the difference between both the images.
OpenCV provides the VideoCature() function which is used to work with the Camera. We can do the following task:
Capture Video from Camera
OpenCV allows a straightforward interface to capture live stream with the camera (webcam). It converts video into grayscale and display it.
We need to create a VideoCapture object to capture a video. It accepts either the device index or the name of a video file. A number which is specifying to the camera is called device index. We can select the camera by passing the O or 1 as an argument. After that we can capture the video frame-by-frame.
The cap.read() returns a boolean value(True/False).It will return True, if the frame is read correctly.
Playing Video from file
We can play the video from the file. It is similar to capturing from the camera by changing the camera index with the file name. The time must be appropriate for cv2.waitKey() function, if time is high, video will be slow. If time is too less, then the video will be very fast.
Saving a Video
The cv2.imwrite() function is used to save the video into the file. First, we need to create a VideoWriter object. Then we should specify the FourCC code and the number of frames per second (fps). The frame size should be passed within the function.
FourCC is a 4-byte code used to identify the video codec. The example is given below for saving the video.
It will save the video at the desired location. Run the above code and see the output.
Limitation in the Face Detection
The Facial Recognition System is essential nowadays, and it has come a long way. Its use is essential in quite some applications, for example - Photo retrieval, surveillance, authentication/access, control systems etc. But there are a few challenges that have continuously occurred during image or face recognition system.
These challenges need to be overcome to create more effective face recognition systems. The Following are the challenges which affect the ability of Facial Recognition System to go that extra mile.
The illumination plays an essential role during image recognition. If there is a slight change in lighting conditions, it will make major impact on its results. It is the lighting to vary, and then the result may be different for the same object cause of low or high illumination.
The background of the object also plays a significant role in Face detection. The result might not the same outdoor as compared to what is produces indoors because the factor - affecting its performance-change as soon as the locations change.
The facial recognition system is highly sensitive to pose variations. The movement of head or different camera positions can cause changes of facial texture and it will generate the wrong result.
Occlusion means the face as beard, mustache, accessories (goggles, caps, mask, etc.) also interfere with the estimate of a face recognition system.
Another important factor that should be kept in mind is the different expression of the same individual. Change in facial expressions may produce a different result for the same individual.
In this tutorial, we have learned about the OpenCV library and its basic concept. We have described all the basic operation of the image. In the next tutorial we will learn about the face recognition and face detection.
Face recognition and Face detection using the OpenCV
The face recognition is a technique to identify or verify the face from the digital images or video frame. A human can quickly identify the faces without much effort. It is an effortless task for us, but it is a difficult task for a computer. There are various complexities, such as low resolution, occlusion, illumination variations, etc. These factors highly affect the accuracy of the computer to recognize the face more effectively. First, it is necessary to understand the difference between face detection and face recognition.
Face Detection: The face detection is generally considered as finding the faces (location and size) in an image and probably extract them to be used by the face detection algorithm.
Face Recognition: The face recognition algorithm is used in finding features that are uniquely described in the image. The facial image is already extracted, cropped, resized, and usually converted in the grayscale.
There are various algorithms of face detection and face recognition. Here we will learn about face detection using the HAAR cascade algorithm.
Basic Concept of HAAR Cascade Algorithm
The HAAR cascade is a machine learning approach where a cascade function is trained from a lot of positive and negative images. Positive images are those images that consist of faces, and negative images are without faces. In face detection, image features are treated as numerical information extracted from the pictures that can distinguish one image from another.
We apply every feature of the algorithm on all the training images. Every image is given equal weight at the starting. It founds the best threshold which will categorize the faces to positive and negative. There may be errors and misclassifications. We select the features with a minimum error rate, which means these are the features that best classifies the face and non-face images.
All possible sizes and locations of each kernel are used to calculate the plenty of features.
HAAR-Cascade Detection in OpenCV
OpenCV provides the trainer as well as the detector. We can train the classifier for any object like cars, planes, and buildings by using the OpenCV. There are two primary states of the cascade image classifier first one is training and the other is detection.
OpenCV provides two applications to train cascade classifier opencv_haartraining and opencv_traincascade. These two applications store the classifier in the different file format.
For training, we need a set of samples. There are two types of samples:
A set of negative samples must be prepared manually, whereas the collection of positive samples are created using the opencv_createsamples utility.
Negative samples are taken from arbitrary images. Negative samples are added in a text file. Each line of the file contains an image filename (relative to the directory of the description file) of the negative sample. This file must be created manually. Defined images may be of different sizes.
Positive samples are created by opencv_createsamples utility. These samples can be created from a single image with an object or from an earlier collection. It is important to remember that we require a large dataset of positive samples before you give it to the mentioned utility because it only applies the perspective transformation.
Here we will discuss detection. OpenCV already contains various pre-trained classifiers for face, eyes, smile, etc. Those XML files are stored in opencv/data/haarcascades/ folder. Let's understand the following steps:
Step - 1
First, we need to load the necessary XML classifiers and load input images (or video) in grayscale mode.
After converting the image into grayscale, we can do the image manipulation where the image can be resized, cropped, blurred, and sharpen if required. The next step is image segmentation; identify the multiple objects in the single image, so the classifier quickly detects the objects and faces in the picture.
Step - 3
The haar-Like feature algorithm is used to find the location of the human faces in frame or image. All the Human faces have some common universal properties of faces like the eye region is darker than it's neighbor's pixels and nose region is more bright than the eye region.
In this step, we extract the features from the image, with the help of edge detection, line detection, and center detection. Then provide the coordinate of x, y, w, h, which makes a rectangle box in the picture to show the location of the face. It can make a rectangle box in the desired area where it detects the face.
Face recognition using OpenCV
Face recognition is a simple task for humans. Successful face recognition tends to effective recognition of the inner features (eyes, nose, mouth) or outer features (head, face, hairline). Here the question is that how the human brain encode it?
David Hubel and Torsten Wiesel show that our brain has specialized nerve cells responding to unique local feature of the scene, such as lines, edges angle, or movement. Our brain combines the different sources of information into the useful patterns; we don't see the visual as scatters. If we define face recognition in the simple word, "Automatic face recognition is all about to take out those meaningful features from an image and putting them into a useful representation then perform some classification on them".
The basic idea of face recognition is based on the geometric features of a face. It is the feasible and most intuitive approach for face recognition. The first automated face recognition system was described in the position of eyes, ears, nose. These positioning points are called features vector (distance between the points).
The face recognition is achieved by calculating the Euclidean distance between feature vectors of a probe and reference image. This method is effective in illumination change by its nature, but it has a considerable drawback. The correct registration of the maker is very hard.The face recognition system can operate basically in two modes:
It compares the input facial image with the facial image related to the user, which is required authentication. It is a 1x1 comparison.
It basically compares the input facial images from a dataset to find the user that matches that input face. It is a 1xN comparison.
There are various types of face recognition algorithms, for example:
Each algorithm follows the different approaches to extract the image information and perform the matching with the input image. Here we will discuss the Local Binary Patterns Histogram (LBPH) algorithm which is one of the oldest and popular algorithm.
Introduction of LBPH
Local Binary Pattern Histogram algorithm is a simple approach that labels the pixels of the image thresholding the neighborhood of each pixel. In other words, LBPH summarizes the local structure in an image by comparing each pixel with its neighbors and the result is converted into a binary number. It was first defined in 1994 (LBP) and since that time it has been found to be a powerful algorithm for texture classification.
This algorithm is generally focused on extracting local features from images. The basic idea is not to look at the whole image as a high-dimension vector; it only focuses on the local features of an object.
In the above image, take a pixel as center and threshold its neighbor against. If the intensity of the center pixel is greater-equal to its neighbor, then denote it with 1 and if not then denote it with 0.
Let's understand the steps of the algorithm:
1. Selecting the Parameters: The LBPH accepts the four parameters:
Note: The above parameters are slightly confusing. It will be more clear in further steps.
2. Training the Algorithm: The first step is to train the algorithm. It requires a dataset with the facial images of the person that we want to recognize. A unique ID (it may be a number or name of the person) should provide with each image. Then the algorithm uses this information to recognize an input image and give you the output. An Image of particular person must have the same ID. Let's understand the LBPH computational in the next step.
3. Using the LBP operation: In this step, LBP computation is used to create an intermediate image that describes the original image in a specific way through highlighting the facial characteristic. The parameters radius and neighbors are used in the concept of sliding window.
To understand in a more specific way, let's break it into several small steps:
4. Extracting the Histograms from the image: The image is generated in the last step, we can use the Grid X and Grid Y parameters to divide the image into multiple grids, let's consider the following image:
5. Performing face recognition: Now, the algorithm is well trained. The extracted histogram is used to represent each image from the training dataset. For the new image, we perform steps again and create a new histogram. To find the image that matches the given image, we just need to match two histograms and return the image with the closest histogram.
We have discussed the face detection and face recognition. The haar like cascade algorithm is used for face detection. There are various algorithms for face recognition, but LBPH is easy and popular algorithm among them. It generally focuses on the local features in the image.
Before learning OpenCV, you must have the basic knowledge of Python.
Our OpenCV tutorial is designed to help beginners and professionals.
We assure you that will not find any problem in this OpenCV tutorial. But if there is any mistake or error, please post the error in the contact form.