Javatpoint Logo
Javatpoint Logo

OpenCV (Computer Vision Library) Using Python

OpenCV tutorial provides basic and advanced concepts of OpenCV. Our OpenCV tutorial is designed for beginners and professionals.

What is OpenCV?

What is OpenCV

OpenCV is a Python open-source library, which is used for computer vision in Artificial intelligence, Machine Learning, face recognition, etc.

In OpenCV, the CV is an abbreviation form of a computer vision, which is defined as a field of study that helps computers to understand the content of the digital images such as photographs and videos.

The purpose of computer vision is to understand the content of the images. It extracts a description from the pictures, which may be an object, a text description, and three-dimension model, and so on. For example, cars can be facilitated with computer vision, which will be able to identify and different objects around the road, such as traffic lights, pedestrians, traffic signs, and so on, and acts accordingly.

What is OpenCV

Computer vision allows the computer to perform the same kind of tasks as humans with the same efficiency. There are a two main task which are defined below:

  • Object Classification - In the object classification, we train a model on a dataset of particular objects, and the model classifies new objects as belonging to one or more of your training categories.
  • Object Identification - In the object identification, our model will identify a particular instance of an object - for example, parsing two faces in an image and tagging one as Virat Kohli and other one as Rohit Sharma.
What is OpenCV


OpenCV stands for Open Source Computer Vision Library, which is widely used for image recognition or identification. It was officially launched in 1999 by Intel. It was written in C/C++ in the early stage, but now it is commonly used in Python for the computer vision as well.

The first alpha version of OpenCV was released for the common use at the IEEE Conference on Computer Vision and Pattern Recognition in 2000, and between 2001 and 2005, five betas were released. The first 1.0 version was released in 2006.

The second version of the OpenCV was released in October 2009 with the significant changes. The second version contains a major change to the C++ interface, aiming at easier, more type-safe, pattern, and better implementations. Currently, the development is done by an independent Russian team and releases its newer version in every six months.

Installation of the OpenCV

Install OpenCV using Anaconda

The first step is to download the latest Anaconda graphic installer for Windows from it official site. Choose your bit graphical installer. You are suggested to install 3.7 working with Python 3.

Installation of OpenCV

Choose the graphical bit installer

Installation of OpenCV

After installing it, open the Anaconda prompt and type the following command.

Installation of OpenCV

Press the Enter button and it will download all the related OpenCV configuration.

Installation of OpenCV

Install OpenCV in the Windows via pip

OpenCV is a Python library so it is necessary to install Python in the system and install OpenCV using pip command:

We can install it without extra modules by the following command:

Installation of OpenCV

Open the command prompt and type the following code to check if the OpenCV is installed or not.

Installation of OpenCV

Why OpenCV is used for Computer Vision?

  • OpenCV is available for free of cost.
  • Since the OpenCV library is written in C/C++, so it is quit fast. Now it can be used with Python.
  • It require less RAM to usage, it maybe of 60-70 MB.
  • Computer Vision is portable as OpenCV and can run on any device that can run on C.

How does computer recognize the image?

Human eyes provide lots of information based on what they see. Machines are facilitated with seeing everything, convert the vision into numbers and store in the memory. Here the question arises how computer convert images into numbers. So the answer is that the pixel value is used to convert images into numbers. A pixel is the smallest unit of a digital image or graphics that can be displayed and represented on a digital display device.

How does computer recognize the image

The picture intensity at the particular location is represented by the numbers. In the above image, we have shown the pixel values for a grayscale image consist of only one value, the intensity of the black color at that location.

There are two common ways to identify the images:

1. Grayscale

Grayscale images are those images which contain only two colors black and white. The contrast measurement of intensity is black treated as the weakest intensity, and white as the strongest intensity. When we use the grayscale image, the computer assigns each pixel value based on its level of darkness.

2. RGB

An RGB is a combination of the red, green, blue color which together makes a new color. The computer retrieves that value from each pixel and puts the results in an array to be interpreted.

OpenCV cvtColor

The cvtColor is used to convert an image from one color space to another. The syntax is following:


src - It is used to input an image: 8-bit unsigned.

dst - It is used to display an image as output. The output image will be same size and depth as input image.

code - color space conversion code.

OpenCV Reading Images

OpenCV allows us to perform multiple operations on the image, but to do that it is necessary to read an image file as input, and then we can perform the various operations on it. OpenCV provides following functions which are used to read and write the images.

OpenCV imread function

The imread() function loads image from the specified file and returns it. The syntax is:


filename: Name of the file to be loaded

flag: The flag specifies the color type of a loaded image:

  • CV_LOAD_IMAGE_ANYDEPTH - If we set it as flag, it will return 16-bits/32-bits image when the input has the corresponding depth, otherwise convert it to 8-BIT.
  • CV_LOAD_IMAGE_COLOR - If we set it as flag, it always return the converted image to the color one.
  • C V_LOAD_IMAGE_GRAYSCALE - If we set it as flag, it always convert image into the grayscale.

The imread() function returns a matrix, if the image cannot be read because of unsupported file format, missing file, unsupported or invalid format. Currently, the following file formats are supported.

Window bitmaps - *.bmp, *.dib
JPEG files - *.jpeg, *.jpg, *.jpe
Portable Network Graphics - *.png
Portable image format- *.pbm, *.pgm, *.ppm
TIFF files - *.tiff, *.tif

Note: The color images, the decoded images will have the channels stored in the BGR order.

Let's consider the following example:

Output: it will display the following image.

OpenCV Reading Images

OpenCV Writing Images

OpenCV imwrite() function is used to save an image to a specified file. The file extension defines the image format. The syntax is the following:


filename- Name of the file to be loaded

image- Image to be saved.

params- The following parameters are currently supported:

  • For JPEG, quality can be from 0 to 100. The default value is 95.
  • For PNG, quality can be the compress level from 0 to 9. The default value is 1.
  • For PPM, PGM, or PBM, it can be a binary format flag 0 or 1. The default value is 1.

Let's consider the following example:


Image written to file-system : True

If the imwrite() function returns the True, which means the file is successfully written in the specified file.

OpenCV Resize the image

Sometimes, it is necessary to transform the loaded image. In the image processing, we need to resize the image to perform the particular operation. Images are generally stored in Numpy ndarray(array). The ndarray.shape is used to obtain the dimension of the image. We can get the width, height, and numbers of the channels for each pixel by using the index of the dimension variable.

Example: 1-


Resized Dimensions :  (199, 300, 3)

OpenCV Resize the image

The resizing of image means changing the dimension of the image, its width or height as well as both. Also the aspect ratio of the original image could be retained by resizing an image. OpenCV provides cv2.resize() function to resize the image. The syntax is given as:


  • src - source/input image (required).
  • dsize - desired size for the output image(required)
  • fx - Scale factor along the horizontal axis.(optional)
  • fy - Scale factor along the vertical axis.
  • Interpolation(optional) - This flag uses following methods:
    • INTER_NEAREST - A nearest-interpolation INTER_AREA - resampling using pixel area relation. When we attempt to do image zoom, it is similar to the INTER_NEAREST method.
    • INTER_CUBIC - A bicubic interpolation over 44 pixel neighborhood.
    • INTER_LANCOZS4 - Lanczos interpolation over 88 pixel neighborhood.

Example of resizing the images

There are several ways to resize the image. Below are some examples to perform resize operation:

  1. Retain Aspect Ratio ( height to width ratio of the image is retained)
    • Downscale(Decrement in the size of the image)
    • Upscale(Increment in the size of image)
  2. Do not preserve Aspect Ratio
      Resize only the width
    • Resize only the height
  3. Resize the specified width and height

Retain the aspect ratio

  • Downscale with resize()


Original Dimensions :  (332, 500, 3)
Resized Dimensions :  (199, 300, 3)

Example of resizing the images

In the above example, the scale_per variable holds the percentage of the image which needs to be scaled. The value<100 is used to downscale the provided image. We will use this scale_per value along with the original image's dimension to calculate the width and height of the output image.

Upscale with resize()


Original Dimensions :  (332, 500, 3)
Resized Dimensions :  (398, 600, 3)

Example of resizing the images

Not retaining the aspect ratio

  • Resize only the width

In the below example, we have provided a specific value in pixel for width and the height will remain unaffected.


Original Dimensions :  (332, 500, 3)
Resized Dimensions :  (440, 500, 3)

Example of resizing the images
  • Resize the height

In the below example, the scale_per value holds the percentage by which height has to be scaled or we can provide the specific value in pixels.


Original Dimensions :  (332, 500, 3)
Resized Dimensions :  (200, 500, 3)

Example of resizing the images

Resize the specific width and height

  • We can specify both width and height.


Example of resizing the images

OpenCV Image Rotation

The image can be rotated in various angles (90,180,270 and 360). OpenCV calculates the affine matrix that performs affine transformation, which means it does not preserve the angle between the lines or distances between the points, although it preserves the ratio of distances between points lying on the lines.

The syntax of the rotate image is the following:


  • center: It represents the center of the image.
  • angle: It represents the angle by which a particular image to be rotated in the anti-clockwise direction.
  • rotated: ndarray that holds the rotated image data.
  • scale: The value 1.0 is denoted that the shape is preserved. Scale the image according to the provided value.



OpenCV Image Rotation

OpenCV Gaussian Blur (Image Smoothing)

Image smoothing is a technique which helps in reducing the noise in the images. Image may contain various type of noise because of camera sensor. It basically eliminates the high frequency (noise, edge) content from the image so edges are slightly blurred in this operation. OpenCV provide gaussianblur() function to apply smoothing on the images. The syntax is following:


  • src -It is used to input an Image.
  • dst -It is a variable which stores an output Image.
  • ksize -It defines the Gaussian Kernel Size[height width ]. Height and width must be odd (1,3,5,..) and can have different values. If ksize is set to [0,0], then ksize is computed from sigma value.
  • sigmaX - Kernel standard derivation along X-axis.(horizontal direction).
  • sigmaY - Kernel standard derivation along Y-axis (vertical direction). If sigmaY = 0 then sigmaX value is taken for sigmaY.

borderType - These are the specified image boundaries while kernel is applied on the image borders. Possible border type are:


OpenCV Gaussian Blur

OpenCV Blob Detection

Blob stands for Binary Large Object and refers to the connected pixel in the binary image. The term "Large" focuses on the object of a specific size, and that other "small" binary objects are usually noise. There are three processes regarding BLOB analysis.

BLOB extraction

Blob extraction means to separate the BLOBs (objects) in a binary image. A BLOB contains a group of connected pixels. We can determine whether two pixels are connected or not by the connectivity, i.e., which pixels is neighbor of another pixel. There are two types of connectivity. The 8-connectivity and the 4-connectivity. The 8-connectivity is far better than 4-connectivity.

BLOB representation

BLOB representation is simply means that convert the BLOB into a few representative numbers. After the BLOB extraction, the next step is to classify the several BLOBs. There are two steps in the BLOB representation process. In the first step, each BLOB is denoted by several characteristics, and the second step is to apply some matching methods that compare the features of each BLOB.

BLOB classification

Here we determine the type of BLOB, for example, given BLOB is a circle or not. Here the question is how to define which BLOBs are circle and which are not based on their features that we described earlier. For this purpose, generally we need to make a prototype model of the object we are looking for.

How to perform Background Subtraction?

Background subtraction is widely used to generating a foreground mask. The binary images contain the pixels which belong to moving objects in the scene. Background subtraction calculates the foreground mask and performs the subtraction between the current frame and background model.

There are two main steps in Background modeling

  • Background Initialization- In this step, an initial model of the background is computed.
  • Background Update- In this step, that model is updated that adapt the possible change in the scene.

Manual subtraction from the first frame

First, we import the libraries and load the video. Next, we take the first frame of the video, convert it into grayscale, and apply the Gaussian Blur to remove some noise. We use the while loop, so we load frame one by one. After doing this, we get the core part of the background of the subtraction where we calculate the absolute difference between the first frame and the current frame.


Subtraction using Subtractor MOG2

OpenCV provides the subtractor MOG2 which is effective than the manual mode. The Subtractor MOG2 has the benefit of working with the frame history. The syntax is as follows:

The first argument, history is the number of the last frame(by default 120).

The second argument, a varThreshold is the value that used when evaluating the difference to extract the background. A lower threshold will find more variation with the advantage of a noisier image.

The third argument, detectShadows is the functions of the algorithm which can remove the shadow if enabled.


OpenCV Image Threshold

The basic concept of the threshold is that more simplify the visual data for analysis. When we convert the image into gray-scale, we have to remember that grayscale still has at least 255 values. The threshold is converted everything to white or black, based on the threshold value. Let's assume we want the threshold to be 125(out of 255), then everything that was under the 125 would be converted to 0 or black, and everything above the 125 would be converted to 255, or white. The syntax is as follows:


src: Source image, it should be a grayscale image.

thresh: It is used to classify the pixel value.

maxVal: It represents the value to be given if the pixel threshold value.

OpenCV provides different styles of threshold that is used as fourth parameter of the function. These are the following:


Let's take a sample input image

OpenCV Image Threshold

We have taken above image as an input. We describe how threshold actually works. The above image is slightly dim and little bit hard to read. Some parts are light enough to read, while other part is required more focus to read properly.

Let's consider the following example:


OpenCV Image Threshold

OpenCV Edge detection

Edge detection is term where identify the boundary of object in image. We will learn about the edge detection using the canny edge detection technique. The syntax is canny edge detection function is given as:


  • /path/to/img: file path of the image (required)
  • minVal: Minimum intensity gradient (required)
  • maxVal: Maximum intensity gradient (required)
  • aperture: It is optional argument.
  • L2gradient: Its default value is false, if value is true, Canny () uses a more computationally expensive equation to detect edges, which provides more accuracy at the cost of resources.

Example: 1


OpenCV Edge detection

Example: Real Time Edge detection


OpenCV Edge detection

OpenCV Contours

Contours are defined as a curve joining all the continuous points (along the boundary), having the same color or intensity. In the other, we find counter in a binary image, we focus to find the boundary in the binary image. The official definition is following:

The Contours are the useful tool for shape analysis and object detection and recognition.

To maintain accuracy, we should use the binary images. First, we apply the threshold or canny edge detection.

In OpenCV, finding the contour in the binary image is the same as finding white object from a black background.

OpenCV provides findContours(), which is used to find the contour in the binary image. The syntax is following:

The findContours () accepts the three argument first argument is source image, second is contour retrieval mode, and the third is contours approximation.

Let's consider the following example:

How to draw the Contours?

OpenCV provides the cv2.drawContours() function, which is used to draw the contours. It is also used to draw any shape by providing its boundary points. Syntax of cv2.drawContours() function is given below:

To draw all the contours in an image:

To draw an individual contour, suppose 3rd counter

The first argument represents the image source, second argument represents the contours which should be passed as a Python list, the third argument is used as index of Contours, and other arguments are used for color thickness.

Contour Approximation Method

It is the third argument in the cv2.findCounter(). Above, we have described it to draw the boundary of the shape with same intensity. It stores the (x,y) coordinates of the boundary of a shape. But here the question arise does it store all the coordinates? That is specified by the contour approximation method.

If we pass the cv.CHAIN_APPROX_NONE, it will store all the boundary points. Sometimes it does not need to store all the points coordinate, suppose we found the contours of a straight line where it does not require to store all the contour points, it requires only two endpoints to store. So for such case, we use cv.CHAIN_APPROX_NONE, it removes all redundant points and compresses the contours, thereby saving memory.


OpenCV Contours

In the above image of rectangle, the first image shows points using with cv.CHAIN_APPROX_NONE(734) and the second image shows the one with cv2.CHAIN_APPROX_SIMPLE(only 4 points). We can see the difference between both the images.

OpenCV VideoCapture

OpenCV provides the VideoCature() function which is used to work with the Camera. We can do the following task:

  • Read video, display video, and save video.
  • Capture from the camera and display it.

Capture Video from Camera

OpenCV allows a straightforward interface to capture live stream with the camera (webcam). It converts video into grayscale and display it.

We need to create a VideoCapture object to capture a video. It accepts either the device index or the name of a video file. A number which is specifying to the camera is called device index. We can select the camera by passing the O or 1 as an argument. After that we can capture the video frame-by-frame.

The returns a boolean value(True/False).It will return True, if the frame is read correctly.

Playing Video from file

We can play the video from the file. It is similar to capturing from the camera by changing the camera index with the file name. The time must be appropriate for cv2.waitKey() function, if time is high, video will be slow. If time is too less, then the video will be very fast.

Saving a Video

The cv2.imwrite() function is used to save the video into the file. First, we need to create a VideoWriter object. Then we should specify the FourCC code and the number of frames per second (fps). The frame size should be passed within the function.

FourCC is a 4-byte code used to identify the video codec. The example is given below for saving the video.

It will save the video at the desired location. Run the above code and see the output.

Limitation in the Face Detection

The Facial Recognition System is essential nowadays, and it has come a long way. Its use is essential in quite some applications, for example - Photo retrieval, surveillance, authentication/access, control systems etc. But there are a few challenges that have continuously occurred during image or face recognition system.

These challenges need to be overcome to create more effective face recognition systems. The Following are the challenges which affect the ability of Facial Recognition System to go that extra mile.

  • Illumination

The illumination plays an essential role during image recognition. If there is a slight change in lighting conditions, it will make major impact on its results. It is the lighting to vary, and then the result may be different for the same object cause of low or high illumination.

  • Background

The background of the object also plays a significant role in Face detection. The result might not the same outdoor as compared to what is produces indoors because the factor - affecting its performance-change as soon as the locations change.

  • Pose

The facial recognition system is highly sensitive to pose variations. The movement of head or different camera positions can cause changes of facial texture and it will generate the wrong result.

  • Occlusion

Occlusion means the face as beard, mustache, accessories (goggles, caps, mask, etc.) also interfere with the estimate of a face recognition system.

  • Expressions

Another important factor that should be kept in mind is the different expression of the same individual. Change in facial expressions may produce a different result for the same individual.

In this tutorial, we have learned about the OpenCV library and its basic concept. We have described all the basic operation of the image. In the next tutorial we will learn about the face recognition and face detection.

Face recognition and Face detection using the OpenCV

The face recognition is a technique to identify or verify the face from the digital images or video frame. A human can quickly identify the faces without much effort. It is an effortless task for us, but it is a difficult task for a computer. There are various complexities, such as low resolution, occlusion, illumination variations, etc. These factors highly affect the accuracy of the computer to recognize the face more effectively. First, it is necessary to understand the difference between face detection and face recognition.

Face Detection: The face detection is generally considered as finding the faces (location and size) in an image and probably extract them to be used by the face detection algorithm.

Face Recognition: The face recognition algorithm is used in finding features that are uniquely described in the image. The facial image is already extracted, cropped, resized, and usually converted in the grayscale.

There are various algorithms of face detection and face recognition. Here we will learn about face detection using the HAAR cascade algorithm.

Basic Concept of HAAR Cascade Algorithm

The HAAR cascade is a machine learning approach where a cascade function is trained from a lot of positive and negative images. Positive images are those images that consist of faces, and negative images are without faces. In face detection, image features are treated as numerical information extracted from the pictures that can distinguish one image from another.

We apply every feature of the algorithm on all the training images. Every image is given equal weight at the starting. It founds the best threshold which will categorize the faces to positive and negative. There may be errors and misclassifications. We select the features with a minimum error rate, which means these are the features that best classifies the face and non-face images.

All possible sizes and locations of each kernel are used to calculate the plenty of features.

HAAR-Cascade Detection in OpenCV

OpenCV provides the trainer as well as the detector. We can train the classifier for any object like cars, planes, and buildings by using the OpenCV. There are two primary states of the cascade image classifier first one is training and the other is detection.

OpenCV provides two applications to train cascade classifier opencv_haartraining and opencv_traincascade. These two applications store the classifier in the different file format.

For training, we need a set of samples. There are two types of samples:

  • Negative sample: It is related to non-object images.
  • Positive samples: It is a related image with detect objects.

A set of negative samples must be prepared manually, whereas the collection of positive samples are created using the opencv_createsamples utility.

Negative Sample

Negative samples are taken from arbitrary images. Negative samples are added in a text file. Each line of the file contains an image filename (relative to the directory of the description file) of the negative sample. This file must be created manually. Defined images may be of different sizes.

Positive Sample

Positive samples are created by opencv_createsamples utility. These samples can be created from a single image with an object or from an earlier collection. It is important to remember that we require a large dataset of positive samples before you give it to the mentioned utility because it only applies the perspective transformation.

Face recognition and Face detection

Here we will discuss detection. OpenCV already contains various pre-trained classifiers for face, eyes, smile, etc. Those XML files are stored in opencv/data/haarcascades/ folder. Let's understand the following steps:

Step - 1

First, we need to load the necessary XML classifiers and load input images (or video) in grayscale mode.

Step -2

After converting the image into grayscale, we can do the image manipulation where the image can be resized, cropped, blurred, and sharpen if required. The next step is image segmentation; identify the multiple objects in the single image, so the classifier quickly detects the objects and faces in the picture.

Step - 3

The haar-Like feature algorithm is used to find the location of the human faces in frame or image. All the Human faces have some common universal properties of faces like the eye region is darker than it's neighbor's pixels and nose region is more bright than the eye region.

Step -4

In this step, we extract the features from the image, with the help of edge detection, line detection, and center detection. Then provide the coordinate of x, y, w, h, which makes a rectangle box in the picture to show the location of the face. It can make a rectangle box in the desired area where it detects the face.

Face recognition and Face detection

Face recognition using OpenCV

Face recognition is a simple task for humans. Successful face recognition tends to effective recognition of the inner features (eyes, nose, mouth) or outer features (head, face, hairline). Here the question is that how the human brain encode it?

David Hubel and Torsten Wiesel show that our brain has specialized nerve cells responding to unique local feature of the scene, such as lines, edges angle, or movement. Our brain combines the different sources of information into the useful patterns; we don't see the visual as scatters. If we define face recognition in the simple word, "Automatic face recognition is all about to take out those meaningful features from an image and putting them into a useful representation then perform some classification on them".

The basic idea of face recognition is based on the geometric features of a face. It is the feasible and most intuitive approach for face recognition. The first automated face recognition system was described in the position of eyes, ears, nose. These positioning points are called features vector (distance between the points).

The face recognition is achieved by calculating the Euclidean distance between feature vectors of a probe and reference image. This method is effective in illumination change by its nature, but it has a considerable drawback. The correct registration of the maker is very hard.

The face recognition system can operate basically in two modes:

  • Authentication or Verification of a facial image-

It compares the input facial image with the facial image related to the user, which is required authentication. It is a 1x1 comparison.

  • Identification or facial recognition

It basically compares the input facial images from a dataset to find the user that matches that input face. It is a 1xN comparison.

There are various types of face recognition algorithms, for example:

  • Eigenfaces (1991)
  • Local Binary Patterns Histograms (LBPH) (1996)
  • Fisherfaces (1997)
  • Scale Invariant Feature Transform (SIFT) (1999)
  • Speed Up Robust Features (SURF) (2006)

Each algorithm follows the different approaches to extract the image information and perform the matching with the input image. Here we will discuss the Local Binary Patterns Histogram (LBPH) algorithm which is one of the oldest and popular algorithm.

Introduction of LBPH

Local Binary Pattern Histogram algorithm is a simple approach that labels the pixels of the image thresholding the neighborhood of each pixel. In other words, LBPH summarizes the local structure in an image by comparing each pixel with its neighbors and the result is converted into a binary number. It was first defined in 1994 (LBP) and since that time it has been found to be a powerful algorithm for texture classification.

This algorithm is generally focused on extracting local features from images. The basic idea is not to look at the whole image as a high-dimension vector; it only focuses on the local features of an object.

Face recognition and Face detection

In the above image, take a pixel as center and threshold its neighbor against. If the intensity of the center pixel is greater-equal to its neighbor, then denote it with 1 and if not then denote it with 0.

Let's understand the steps of the algorithm:

1. Selecting the Parameters: The LBPH accepts the four parameters:

  • Radius: It represents the radius around the central pixel. It is usually set to 1. It is used to build the circular local binary pattern.
  • Neighbors: The number of sample points to build the circular binary pattern.
  • Grid X: The number of cells in the horizontal direction. The more cells and finer grid represents, the higher dimensionality of the resulting feature vector.
  • Grid Y: The number of cells in the vertical direction. The more cells and finer grid represents, the higher dimensionality of the resulting feature vector.

Note: The above parameters are slightly confusing. It will be more clear in further steps.

2. Training the Algorithm: The first step is to train the algorithm. It requires a dataset with the facial images of the person that we want to recognize. A unique ID (it may be a number or name of the person) should provide with each image. Then the algorithm uses this information to recognize an input image and give you the output. An Image of particular person must have the same ID. Let's understand the LBPH computational in the next step.

3. Using the LBP operation: In this step, LBP computation is used to create an intermediate image that describes the original image in a specific way through highlighting the facial characteristic. The parameters radius and neighbors are used in the concept of sliding window.

Face recognition and Face detection

To understand in a more specific way, let's break it into several small steps:

  • Suppose the input facial image is grayscale.
  • We can get part of this image as a window of 3x3 pixels.
  • We can use the 3x3 matrix containing the intensity of each pixel (0-255).
  • Then, we need to take the central value of the matrix to be used as a threshold.
  • This value will be used to define the new values from the 8 neighbors.
  • For every neighbor of the central value (threshold), we set a new binary value. The value 1 is set for equal or higher than the threshold and 0 for values lower than the threshold.
  • Now the matrix will consist of only binary values (skip the central value). We need to take care of each binary value from each position from the matrix line by line into new binary values (10001101). There are other approaches to concatenate the binary values (clockwise direction), but the final result will be the same.
  • We convert this binary value to decimal value and set it to the central value of the matrix, which is a pixel from the original image.
  • After completing the LBP procedure, we get the new image, which represents better characteristics of the original image.
Face recognition and Face detection

4. Extracting the Histograms from the image: The image is generated in the last step, we can use the Grid X and Grid Y parameters to divide the image into multiple grids, let's consider the following image:

Face recognition and Face detection
  • We have an image in grayscale; each histogram (from each grid) will contain only 256 positions representing the occurrence of each pixel intensity.
  • It is required to create a new bigger histogram by concatenating each histogram.

5. Performing face recognition: Now, the algorithm is well trained. The extracted histogram is used to represent each image from the training dataset. For the new image, we perform steps again and create a new histogram. To find the image that matches the given image, we just need to match two histograms and return the image with the closest histogram.

  • There are various approaches to compare the histograms (calculate the distance between two histograms), for example: Euclidean distance, chi-square, absolute value, etc. We can use the Euclidean distance based on the following formula:
Face recognition and Face detection
  • The algorithm will return ID as an output from the image with the closest histogram. The algorithm should also return the calculated distance that can be called confidence measurement. If the confidence is lower than the threshold value, that means the algorithm has successfully recognized the face.

We have discussed the face detection and face recognition. The haar like cascade algorithm is used for face detection. There are various algorithms for face recognition, but LBPH is easy and popular algorithm among them. It generally focuses on the local features in the image.


Before learning OpenCV, you must have the basic knowledge of Python.


Our OpenCV tutorial is designed to help beginners and professionals.


We assure you that will not find any problem in this OpenCV tutorial. But if there is any mistake or error, please post the error in the contact form.

Help Others, Please Share

facebook twitter google plus pinterest

Learn Latest Tutorials


Trending Technologies

B.Tech / MCA