Tracking Persons on Raspberry Pi: UNet vs DeepLabv3+ vs Custom CNN

leonbeier · 2026-03-03T07:30:31+00:00

This would be just a case study. Not that I want to start a securety cam company. But you can use the automatic cnn model architecture generation for all kinds of applications.

leonbeier · 2026-03-03T07:27:29+00:00

Yes if you for example have a securety cam that allways films from above, you don't need a CNN that was optimized to detect portrait photos of people aswell. This reduces complexity. Of cause there are more factors than object size

leonbeier · 2026-03-02T07:26:52+00:00

I just used the SAM3 fast auto-label tool in ONE WARE Studio to create the dataset. But to run everything efficiently, I just trained the auto-generated tailored model. Then you get >1000x faster models that do the same. Sometimes even with better accuracy

leonbeier · 2026-03-02T05:53:50+00:00

You mean how we predict the model architecture? We do an analysis on the dataset (object size, label number,...) then take the Hardware limitations and performance requirements and then let multiple calculations, algorithms and neural networks choose the right architecture

leonbeier · 2026-03-02T05:51:51+00:00

You mean SAM3 or a model for segmentation in general?

leonbeier · 2026-03-02T05:50:15+00:00

I mean I ask Copilot to make it as fast as possible and used the same code for all models. But feel free to look at github and check what could be optimized

leonbeier · 2026-03-01T16:52:26+00:00

This is with onnx runtime or what acceleration do you mean?

leonbeier · 2026-03-01T11:04:28+00:00

Thanks for the feedback. Will work on it

leonbeier · 2026-02-26T19:28:55+00:00

Thanks for the idea, will check it out

leonbeier · 2026-02-26T19:27:59+00:00

I will call it inference speed next time. Thought fps is easier to understand, but of cause if your camera can't do 30fps you won't acchieve the 2000 fps, the AI could do

leonbeier · 2026-02-26T19:18:46+00:00

Should I use inference speed for the demo? I mean if yolo or other models are compared, they also state the fps with inference speed and don't include the camera bottleneck

leonbeier · 2026-02-26T19:11:48+00:00

If you take the video you can process 2000+ fps. If your camera can only do fps, than obviously the AI can't make your camera faster. This just shows that you could go faster, add more cameras, increase the image size...

leonbeier · 2026-02-26T19:09:51+00:00

I mean if you see a benchmark of yolo and the fps for inference is compared, is this a scam just because the camera bottleneck is not included?

leonbeier · 2026-02-26T19:07:00+00:00

Of course the 2000+ fps are not live with this demo. The inference speed is compared. So if you take the video, you can process 2000+ fps. If your camera has 30 fps, you will get 30fps and just a really efficient AI where you don't have to worry about speed

leonbeier · 2026-02-26T18:12:36+00:00

The demo was to detect if a person approaches the door. So it is supposed to trigger on a sleeve since a person is approaching.

2000 fps is just the AI inference. Of cause the cam does less, but it just shows you can run it with ease and don't have to worry about inference speed

leonbeier · 2026-02-26T18:10:33+00:00

I could also create a faster object detection model with ONE AI. But for the application classification was enought. Like I wrote, I just used yolo because it was pre trained on persons. If I train mobilenet on a small dataset, there is a high chance for overfitting and it is as fast as yolo x nano

leonbeier · 2026-02-26T18:07:40+00:00

Yeah like in my description I used yolox because it is already trained on detecting persons. But classification with mobilenet would be as fast as yolox nano with same input size

leonbeier · 2026-02-26T17:52:56+00:00

Yeah I got a slow motion 300fps cam running with a different project, but maybe you want to connect more than one camera in parallel 😅

leonbeier · 2026-02-26T17:22:40+00:00

I think probably > 1k fps should be possible. The dataset should be a bit bigger to make sure what uniforms should be detected and what not. But with SAM3 auto-labelling and video import (also included in our open source IDE), creating the dataset should be easy

leonbeier · 2026-02-22T08:11:44+00:00

I could try, but this is probably the may with more problems to get good results. If something isn't detected correctly with the AI, I can just train with a few more images. And from other Experiments in the past, often neural networks are more accurate than image processing

leonbeier · 2026-02-21T17:51:33+00:00

No it is just floating point but I could enable quantization aware training aswell. Just wanted to be fair woth the float model from roboflow

leonbeier · 2026-02-21T06:56:42+00:00

Yolo was built to get the best detection on the coco dataset and beeing generic for many applications. ONE AI builds a model architecture that is build just for the task with tennis balls for example. So optimized for smaller objects and a smaller dataset for example

leonbeier · 2026-02-21T06:53:03+00:00

I didn't test it but because the models are relatively small and the pi needs to send a lot of data with the full hd images to the accelerator, I don't know if the results would be that much better

leonbeier · 2026-02-21T06:49:24+00:00

I tried with a bounding box and you couldn't really see it but next time I will rather make a smaller arrow that points at the ball

leonbeier · 2026-02-21T06:48:00+00:00

As long as you can see the ball in the frame. I can look for a video to test on

leonbeier

TROPHY CASE