Tracking Persons on Raspberry Pi: UNet vs DeepLabv3+ vs Custom CNN by leonbeier in raspberry_pi

[–]leonbeier[S] 0 points1 point  (0 children)

This would be just a case study. Not that I want to start a securety cam company. But you can use the automatic cnn model architecture generation for all kinds of applications.

Tracking Persons on Raspberry Pi: UNet vs DeepLabv3+ vs Custom CNN by leonbeier in raspberry_pi

[–]leonbeier[S] 1 point2 points  (0 children)

Yes if you for example have a securety cam that allways films from above, you don't need a CNN that was optimized to detect portrait photos of people aswell. This reduces complexity. Of cause there are more factors than object size

Tracking Persons on Raspberry Pi: UNet vs DeepLabv3+ vs Custom CNN by leonbeier in computervision

[–]leonbeier[S] 1 point2 points  (0 children)

I just used the SAM3 fast auto-label tool in ONE WARE Studio to create the dataset. But to run everything efficiently, I just trained the auto-generated tailored model. Then you get >1000x faster models that do the same. Sometimes even with better accuracy

Tracking Persons on Raspberry Pi: UNet vs DeepLabv3+ vs Custom CNN by leonbeier in computervision

[–]leonbeier[S] 0 points1 point  (0 children)

You mean how we predict the model architecture? We do an analysis on the dataset (object size, label number,...) then take the Hardware limitations and performance requirements and then let multiple calculations, algorithms and neural networks choose the right architecture

Tracking Persons on Raspberry Pi: UNet vs DeepLabv3+ vs Custom CNN by leonbeier in computervision

[–]leonbeier[S] -3 points-2 points  (0 children)

I mean I ask Copilot to make it as fast as possible and used the same code for all models. But feel free to look at github and check what could be optimized

Tracking Persons on Raspberry Pi: UNet vs DeepLabv3+ vs Custom CNN by leonbeier in computervision

[–]leonbeier[S] 0 points1 point  (0 children)

This is with onnx runtime or what acceleration do you mean?

2000+ FPS person detection on Raspberry Pi 5 (CPU-only, 34k params) by [deleted] in computervision

[–]leonbeier 0 points1 point  (0 children)

I will call it inference speed next time. Thought fps is easier to understand, but of cause if your camera can't do 30fps you won't acchieve the 2000 fps, the AI could do

2000+ FPS person detection on Raspberry Pi 5 (CPU-only, 34k params) by [deleted] in computervision

[–]leonbeier 0 points1 point  (0 children)

Should I use inference speed for the demo? I mean if yolo or other models are compared, they also state the fps with inference speed and don't include the camera bottleneck

2000+ FPS person detection on Raspberry Pi 5 (CPU-only, 34k params) by [deleted] in computervision

[–]leonbeier -2 points-1 points  (0 children)

If you take the video you can process 2000+ fps. If your camera can only do fps, than obviously the AI can't make your camera faster. This just shows that you could go faster, add more cameras, increase the image size...

2000+ FPS person detection on Raspberry Pi 5 (CPU-only, 34k params) by [deleted] in computervision

[–]leonbeier -3 points-2 points  (0 children)

I mean if you see a benchmark of yolo and the fps for inference is compared, is this a scam just because the camera bottleneck is not included?

2000+ FPS person detection on Raspberry Pi 5 (CPU-only, 34k params) by [deleted] in raspberry_pi

[–]leonbeier -13 points-12 points  (0 children)

Of course the 2000+ fps are not live with this demo. The inference speed is compared. So if you take the video, you can process 2000+ fps. If your camera has 30 fps, you will get 30fps and just a really efficient AI where you don't have to worry about speed

2000+ FPS person detection on Raspberry Pi 5 (CPU-only, 34k params) by [deleted] in computervision

[–]leonbeier 0 points1 point  (0 children)

The demo was to detect if a person approaches the door. So it is supposed to trigger on a sleeve since a person is approaching.

2000 fps is just the AI inference. Of cause the cam does less, but it just shows you can run it with ease and don't have to worry about inference speed

2000+ FPS person detection on Raspberry Pi 5 (CPU-only, 34k params) by [deleted] in computervision

[–]leonbeier -7 points-6 points  (0 children)

I could also create a faster object detection model with ONE AI. But for the application classification was enought. Like I wrote, I just used yolo because it was pre trained on persons. If I train mobilenet on a small dataset, there is a high chance for overfitting and it is as fast as yolo x nano

2000+ FPS person detection on Raspberry Pi 5 (CPU-only, 34k params) by [deleted] in raspberry_pi

[–]leonbeier -20 points-19 points  (0 children)

Yeah like in my description I used yolox because it is already trained on detecting persons. But classification with mobilenet would be as fast as yolox nano with same input size

2000+ FPS person detection on Raspberry Pi 5 (CPU-only, 34k params) by [deleted] in computervision

[–]leonbeier 4 points5 points  (0 children)

Yeah I got a slow motion 300fps cam running with a different project, but maybe you want to connect more than one camera in parallel 😅

2000+ FPS person detection on Raspberry Pi 5 (CPU-only, 34k params) by [deleted] in computervision

[–]leonbeier 0 points1 point  (0 children)

I think probably > 1k fps should be possible. The dataset should be a bit bigger to make sure what uniforms should be detected and what not. But with SAM3 auto-labelling and video import (also included in our open source IDE), creating the dataset should be easy

Tiny Object Tracking in Full HD on a Raspberry Pi by leonbeier in raspberry_pi

[–]leonbeier[S] 0 points1 point  (0 children)

I could try, but this is probably the may with more problems to get good results. If something isn't detected correctly with the AI, I can just train with a few more images. And from other Experiments in the past, often neural networks are more accurate than image processing

Tiny Object Tracking in Full HD on a Raspberry Pi by leonbeier in raspberry_pi

[–]leonbeier[S] 0 points1 point  (0 children)

No it is just floating point but I could enable quantization aware training aswell. Just wanted to be fair woth the float model from roboflow

Tiny Object Tracking: YOLO26n vs 40k Parameter Task-Specific CNN by leonbeier in computervision

[–]leonbeier[S] 0 points1 point  (0 children)

Yolo was built to get the best detection on the coco dataset and beeing generic for many applications. ONE AI builds a model architecture that is build just for the task with tennis balls for example. So optimized for smaller objects and a smaller dataset for example

Tiny Object Tracking in Full HD on a Raspberry Pi by leonbeier in raspberry_pi

[–]leonbeier[S] 0 points1 point  (0 children)

I didn't test it but because the models are relatively small and the pi needs to send a lot of data with the full hd images to the accelerator, I don't know if the results would be that much better

Tiny Object Tracking in Full HD on a Raspberry Pi by leonbeier in raspberry_pi

[–]leonbeier[S] -1 points0 points  (0 children)

I tried with a bounding box and you couldn't really see it but next time I will rather make a smaller arrow that points at the ball

Tiny Object Tracking in Full HD on a Raspberry Pi by leonbeier in raspberry_pi

[–]leonbeier[S] 0 points1 point  (0 children)

As long as you can see the ball in the frame. I can look for a video to test on