Javatpoint Logo
Javatpoint Logo



In the domain of computer vision, the capacity to precisely and powerfully track objects inside videos is a vital and steadily extending field. Object tracking includes following a particular object's movement all through a video succession, empowering applications going from surveillance and independent vehicles to increased reality and video editing. This article investigates the universe of object tracking, with a specific spotlight on YOLOv5 as a high-level object-tracking arrangement.

Understanding Object Tracking

Object tracking, generally, is the most common way of finding and checking an assigned object inside a video frame by frame. Its significance in video examination couldn't be more significant, as it makes the way for a large number of utilizations. Whether it's tracking a person on foot in a self-driving vehicle's field of view, watching a suspect in a surveillance camera feed, or following a game player during a game broadcast, object tracking improves situational mindfulness and dynamic capacities.

Conventional object-tracking strategies frequently depend on heuristics, optical streams, or high-quality features to follow objects. These methodologies, while powerful in specific situations, can battle with testing conditions like occlusions, rapid object movement, and changing lighting conditions.


YOLO (You Only Look Once) is a notable object detection and tracking framework that gives both precision and speed. YOLOv5 addresses the most recent development of this innovation. YOLO's key development lies in its continuous object detection capacities, empowering it to handle images and videos at noteworthy velocities. YOLOv5 expands upon its predecessors with further developed execution, more modest model sizes, and upgraded exactness.

Object Detection versus Object Tracking

Object detection and object tracking are firmly related; however, they fill various needs. Object detection distinguishes and confines objects in a solitary frame, ordinarily with bouncing boxes and names. Object tracking, then again, follows objects across numerous frames, keeping up with their identities and directions.

YOLOv5 is intended for object detection yet can be adjusted for tracking by partner detections across frames, changing it into an object-tracking apparatus.

YOLOv5 Object Tracker: How It Works

The design of YOLOv5 involves convolutional neural networks (CNNs) that succeed at separating features from images and videos. For object tracking, YOLOv5 uses the rich feature portrayals picked up during object detection to follow objects over the long haul. This includes matching objects in back-to-back frames in view of their features and anticipating their future positions.

The concept of feature extraction and re-ID is significant in YOLOv5's tracking component. By encoding object features and once again recognizing them in ensuing frames, YOLOv5 keeps up with object personality all through the video grouping.

YOLOv5 for Object Tracking Training

Preparing YOLOv5 for object tracking includes setting up a dataset with explained object tracks. This dataset should contain groupings of frames with bouncing box comments that connect objects across frames. While pre-prepared YOLOv5 models are accessible, tweaking a custom tracking dataset is fundamental to adjusting the model to explicit tracking errands.

Python Implementation

Step 1: The first step is to import all necessary libraries and packages. For this image tracking using YOLO V5, we mainly use cv2, torch, etc.

cv2: Public from OpenCV is used mainly for capturing videos and images and preprocessing them. It is the helping hand for video window displays.

Torch: from PyTorch, used for loading and running the YOLOv5 model.

pathlib.Path: for working with the directories and paths of different files. This is used from the pathlib module.


Step 2:

The next step is to load the model. For loading torch.hub.load is used with the help of the ultraanalytics/yolov5 repository from github. There are various model sizes to choose from the repository based on the necessity. After loading, a threshold confidence value is initialized for object detection. And then will be filtered based on the confidence scores or values in the object detection.


the above code also defines the path to the video file that needs to be inputted. It uses the VideoCapture method from OpenCV to take, open, and read the input file. The video properties like frame width and height also need to be mentioned.

Step 3:

After inputting the video file, the path for the output file also needs to be mentioned. The format for the video file needs to be maintained as the mp4 file for this code. The codec used is mp4v. the VideoWriter methods instance is created in order to preprocess and write the specific frames to the video of the output file. The looping will process every frame from the input and will be terminated until there are no frames left to process.


Step 4:

in this step, the actual object detection of the present current frame will happen and the result will be stored in the detected_objects variable. The YOLOv5 model is used for the object detection. After the current detection, all the detected objects will be filtered out over the threshold confidence values. Then, the low threshold confidence scored values of objects will be completely removed from the list. Then, the list is iterated through all the filtered objects that are detected, and useful information will be extracted. The useful information like class ID, score of confidence, labels, and the coordinates of the bounding box.


Step 5:

In the last steps of the below code, a bounding box is drawn based on the confidence scores and labels for the detected object. Then, the bounding boxes, along with labels and processed frames, will be attached to the output video. Then, the processed frame will be displayed.



Using cache found in /root/.cache/torch/hub/ultralytics_yolov5_master
YOLOv5 🚀 2023-9-18 Python-3.10.12 torch-2.0.1+cu118 CPU

Fusing layers... 
YOLOv5s summary: 213 layers, 7225885 parameters, 0 gradients
Adding AutoShape...

Challenges in Object Tracking with YOLOv5

Object tracking with YOLOv5 faces difficulties like those experienced in conventional tracking strategies. These difficulties incorporate occlusions, scale varieties, and movement obscure. YOLOv5 tends to these difficulties through its powerful feature portrayals and the capacity to relate objects across frames, moderating tracking troubles brought about by brief vanishings or changes for all intents and purposes.

Applications of YOLOv5 Object Tracker

The uses of YOLOv5 object tracking range from various enterprises and areas. In independent vehicles, YOLOv5 can follow people on foot, vehicles, and snags progressively. Surveillance frameworks support recognizing and checking people of interest. It can likewise improve increased reality encounters by precisely tracking and overlaying virtual objects onto this present reality.

In rundown, YOLOv5 addresses an amazing asset for object tracking inside the extensive scene of computer vision. Its speed, exactness, and versatility make it a significant asset across different fields, adding to more secure, more productive, and more vivid applications.


All in all, YOLOv5's ability in object tracking, joined with its noteworthy speed and exactness, opens a universe of conceivable outcomes in computer vision applications. By tending to difficulties, for example, occlusions and scale varieties, YOLOv5 guarantees hearty tracking even in complex situations. Its versatility across enterprises, from independent vehicles to expanded reality, highlights its importance in forming the fate of visual discernment and continuous dynamic frameworks. YOLOv5 remains a demonstration of the consistent development of object-tracking innovations, ready to rethink how we interface with and grasp the visual world.

Youtube For Videos Join Our Youtube Channel: Join Now


Help Others, Please Share

facebook twitter pinterest

Learn Latest Tutorials


Trending Technologies

B.Tech / MCA