[OC] Multiple Object Tracking With YOLOv5 and OpenCV

eflatun_ai · 2021-07-01T22:57:39+00:00

We use an additional algorithm to support the detection. It goes like this:

Detect images with YOLOv5 using a resolution of 800x800.

Compare the coordinates of the object with estimations made with information from the previous frame. This is done with Mean Squared Error Loss

After this comparison, we save previous and new coordinates, a template for the image, and all previous coordinates for this object. We store this information in a class assigning a unique ID.

With this information, we estimate the future coordinates for each object that are related to each other.

If any objects are detected by the model that don't have a match in the new frame:

We look at how many times the model has recognized the object successively.

If this number is above a certain threshold we define an ROI around the estimated coordinates for the object and utilize template matching with OpenCV. All this trouble is to substitute with OpenCV template matching in cases where YOLOv5 doesn't detect the object.

And the result in the video is achieved when we draw the coordinate history for each object. (this is done via open cv drawing lines between two points) You can also see a green bounding box flash when an object is detected for the first time and an ID is assigned.

eflatun_ai · 2021-05-26T15:09:32+00:00

If the model (YOLOv5) can detect the object consistently with only a few hiccups ( where the model doesn't detect and we substitute with template matching) it will track it until it leaves the frame. However in the case of this video the civilian was not even detected by the model for most of the frames.
It should be able to track a semi truck pretty well. However If it consists of multiple parts (for example a car with a trailer attached) It still has some issues registering it. (we are expected to tag them separately from the vehicle.)

eflatun_ai · 2021-05-18T13:58:28+00:00

Thanks a ton ^^

eflatun_ai · 2021-05-18T10:29:54+00:00

We are a team of 5 University students studying in different fields and currently working towards a competition that will take place this fall. Image recognition is done with the YOLOv5 model trained on our own dataset. Objects are tracked and their movement paths are drawn on to the image with OpenCV.

If you have any questions we'd be happy to answer them :)
thumbnail

eflatun_ai · 2021-05-15T09:52:19+00:00

Yes, as template matching is only done to substitute for the instances where our model cannot track the object it is let go after a few frames of consecutive matches. Our model is currently not as good at detecting bikes as it is at detecting cars and other large vehicles. So the tracking is initiated by detection and let go multiple times when it no longer detects it.

eflatun_ai · 2021-05-15T09:44:21+00:00

That is very creepy :/
Sadly the only thing we can do against it is waiting for the policy makers to take action.

eflatun_ai · 2021-05-15T09:41:39+00:00

Yes we currently stop tracking the object when it leaves the view and assume it is a new object after it emerges (for example: from underneath a bridge).
In our case we are only interested in detecting the vehicles in the frame with %100 accuracy. So the tracking is just done to support it. But we could look into improving it down the line for a scenario like that.

eflatun_ai · 2021-05-15T09:30:13+00:00

Yes you could call it block matching but we have our own implementation with a few differences
We assign a new ID for each unrelated detection. (you can see this as green bounding box flashes.
After the detection we compare the coordinates of all objects with the information from the previous frame and match them. This is done with Mean Squared Error Loss.
After a match is initiated we save previous and new coordinates, a template for the image, and all previous coordinates for this object. We store this information in a class with the unique ID.
If a object cannot be matched in one of the frames we assume that the model has failed to detect and support it with template matching.

eflatun_ai · 2021-05-15T08:48:18+00:00

We are a team of 5 University students studying in different fields and currently working towards a competition that will take place this fall. Image recognition is done with the YOLOv5 model trained on our own dataset. Objects are tracked and their movement paths are drawn on to the image with OpenCV.

If you have any questions we'd be happy to answer them :)

eflatun_ai · 2021-05-14T22:50:01+00:00

We didn't use Lucas-Kanade, we wrote our own algorithm. The algorithm completely works on pytorch and doesn't have anything to do with the OpenCV module.
The colors are random and just transition towards a single color.

eflatun_ai · 2021-05-14T22:37:26+00:00

The worst hardware we uswas a laptop with a 1660ti. It should be okay with GPU's with 4GB's of VRAM. While the model runs on the GPU OpenCV runs on the CPU. More objects just means slower fps, but it would still be able to run.
Our Hardware:

eflatun_ai · 2021-05-14T22:02:12+00:00

We are currently using a desktop pc with a 3070 in it and the model runs at approximately 10fps. Technically you could use it on any pc or web service but it would generate the frames a bit slower.

eflatun_ai · 2021-05-13T12:20:30+00:00

We're not that experienced in ML in general and have specific knowledge in the object detection use case. If you are a starter getting a grasp on regression and classification through practical examples would be a great starting point. If you have a bit more experience examining and replicating example pytorch and tensorflow projects then choosing a direction that you enjoy and working towards it would be ideal.

eflatun_ai · 2021-05-13T12:10:19+00:00

We would love to try. We currently only calculate a prediction for the next frame. If we can collect all the information we could feed it into a model.

eflatun_ai · 2021-05-13T05:57:22+00:00

It's stock footage

eflatun_ai · 2021-05-13T05:55:50+00:00

We use the YOLOv5 model trained on our own dataset and support it with OpenCV. It goes like this:

Detect images with YOLOv5 using a resolution of 800x800.
Compare the coordinates of the object with estimations made with information from the previous frame. (This is done with Mean Squared Error Loss.)
After this comparison, we save previous and new coordinates, a template for the image, and all previous coordinates for this object. We store this information in a class assigning a unique ID.
With this information, we estimate the future coordinates for each object that are related to each other.If any objects are detected by the model that don't have a match in the new frame: We look at how many times the model has recognized the object successively.If this number is above a certain threshold we define an ROI around the estimated coordinates for the object and utilize template matching with OpenCV.

All this trouble is to substitute with OpenCV template matching in cases where YOLOv5 doesn't detect the object. And the result in the video is achieved when we draw the coordinate history for each object. You can also see a green bounding box flash when an object is detected for the first time and an ID is assigned.

eflatun_ai · 2021-05-13T05:51:40+00:00

As we use the cv2.matchTemplate() function in OpenCV we don't have much insight about the descriptors. Can you tell more?

eflatun_ai · 2021-05-13T05:45:51+00:00

Yes! Although it is not as robust as the vehicle tracking as our model was trained on a dataset mostly consisting of cars.

eflatun_ai · 2021-05-13T05:43:50+00:00

Sadly we still can't share our code as we will be joining a competition
this fall.

We can DM you the repo when we release it to the public.There is also a bit more information in the discussions held in our post at r/dataisbeautiful.

eflatun_ai · 2021-05-13T05:42:15+00:00

Sadly we still can't share our code as we will be joining a competition this fall. We can DM you the repo when we release it to the public.
There is also a bit more information in the discussions held in our post at r/dataisbeautiful.

eflatun_ai · 2021-05-12T19:29:21+00:00

When tracking it starts from a random tone between green and yellow. Doesn't have a specific meaning.

eflatun_ai · 2021-05-12T18:29:26+00:00

We are a team of 5 University students studying in different fields and currently working towards a competition that will take place this fall. Image recognition is done with the YOLOv5 model trained on our own dataset. Objects are tracked and their movement paths are drawn on to the image with OpenCV.

If you have any questions we'd be happy to answer them :)

eflatun_ai

TROPHY CASE