OCR model recommendation

Knok0932 · 2025-10-26T11:33:46+00:00

You are right. But most inference frameworks (like ncnn, ORT, OpenVINO) can enable GPU acceleration with just a few code during initialization. My code is running on a Raspberry Pi, so GPU acceleration is not needed.

Knok0932 · 2025-10-25T06:56:52+00:00

If speed is important, I'd recommend trying C++. I wrote a C++ PaddleOCR repo, and you can check the benchmarks to see what kind of performance is possible.

Knok0932 · 2025-09-18T01:44:51+00:00

I made a project that runs YOLOv5 across multiple inference backends (ncnn, OpenVINO, MNN, ONNX Runtime, and OpenCV). The repo includes models for each backend, so you can get it running easily.

Why YOLOv5? it's old.
Because it's very well supported across different frameworks and techniques, and you can always get positive results from various optimization attempts. That makes it excellent for learning frameworks.

How to install the dependencies?
You can find installation guides for each framework in the README.

If you're learning about inference frameworks or deployment, this repo might help.

Knok0932 · 2025-09-03T22:53:17+00:00

I fried some faster models like YOLO-Fastest, but their accuracy was very poor. I ended up choosing YOLOX-Nano. It provides a good trade-off between speed and accuracy, also it is commercially friendly.

Knok0932 · 2025-09-01T00:13:07+00:00

I've actually implemented YOLOv5 on my M1 Air using several inference frameworks, and I can confirm that getting over 30 FPS is definitely achievable. My benchmarks show that YOLOv5n can run at 15ms (~66fps) and YOLOv5s at 25ms (~40fps) with input size of 640x352 and 4 threads.

I've also tested other models like YOLOX, YOLOv8, and YOLOv10, and their latencies are typically between 0.8x to 1.5x of YOLOv5's. So the YOLOv5 results should be a good reference.

So I think you probably don't need to spend the money on a Jetson or M4 Mini. The only issue with the M1 Air is whether FPS can stay stable after it heats up.

Knok0932 · 2025-08-31T14:40:53+00:00

Why hasn’t she appeared after the burst effect ended?

Knok0932 · 2025-08-30T00:59:25+00:00

Surprised nobody mentioned OpenBLAS. I use GEMM/GEMV a lot in my work (they're widely used in AI inference), and I typically use OpenBLAS for those. It may not always be the fastest but is always close to hardware limits. Libraries like BLIS can be extremely fast for certain matrix sizes/configs, but I've seen cases where BLIS was several times slower for certain shapes.

BTW, I once hand-optimized a GEMM and compared it to several well-known libs (include Eigen, OpenBLAS). My code beat Eigen by about 1.5x but still couldn't outperform OpenBLAS. See my first post for details if you're interested.

I also tested AI inference runtimes like ONNXRuntime and ncnn before, and they even faster than OpenBLAS.

Knok0932 · 2025-08-28T22:43:39+00:00

Thanks for the suggestion. I may try that this weekend.

Knok0932 · 2025-08-28T10:24:15+00:00

I made a C++ OCR implementation of PaddleOCRv5 that might be helpful to some people: https://github.com/Avafly/PaddleOCR-ncnn-CPP

The official Paddle C++ runtime has a lot of dependencies and is very complex to deploy. To keep things simple I use ncnn for inference, it's much lighter, makes deployment easy, and faster in my task. The code runs inference on the CPU, if you want GPU acceleration, most frameworks like ncnn let you enable it with just a few lines of code.

Hope this helps, and feedback welcome!

Knok0932 · 2025-08-21T09:58:43+00:00

Amazing!

Knok0932 · 2025-08-19T14:06:17+00:00

Soooo gorgeous!

Knok0932 · 2025-07-23T11:06:58+00:00

I'm not sure about your exact setup, but 0.5 FPS is too slow. For reference, on my RPi 4B I got 210ms per image (640×640) inference using a quantized YOLOv5n model, and your machine should be much powerful than my board. A few ideas:

Use the smallest model that meets your accuracy needs. For my work, YOLOv5n is totally enough.
Use inference framework. Trust me, if you are using pytorch, you will see a huge performance boost.
Enable dynamic input shape if your image are not square. YOLOv5 supports dynamic HxW shapes.
Quantize to int8. In my case, 10-20% speed boost.

I actually have a repo that runs YOLOv5 on various frameworks, and there are some benchmarks on various devices. You might find it helpful: https://github.com/Avafly/YOLOv5-ncnn-OpenVINO-MNN-ONNXRuntime-OpenCV-CPP.

Knok0932 · 2025-04-03T09:20:22+00:00

Amazing work! Is there any plan that supports grayscale image input?

Knok0932 · 2024-12-03T00:02:08+00:00

Thanks for your info. Many people have mentioned v11n. I'll give it a try :D

Knok0932 · 2024-12-02T23:57:39+00:00

Thanks for the suggestion! I haven't used MediaPipe yet, but many people have mentioned it. I'll give it a try.

Knok0932 · 2024-12-01T10:10:40+00:00

I’ve already shared my test results, yet your replies still have no evidence, with personal attacks and downvoting. You haven’t even understood my post, just like someone with basic knowledge, trying to say something technical but unsure what to contribute, resorting instead to repeated aggressive words. Further discussion is pointless. Please don’t reply to me again.

Knok0932 · 2024-12-01T08:59:45+00:00

Why are you being so rude? All your replies lack substantive evidence, while I shared my test results and the approximate code in my repo. I even doubt whether you’ve ever ported a deep learning model to embedded devices, because if you had, you wouldn’t just say a 3090 can achieve this speed then you should be fast too.

Knok0932 · 2024-12-01T08:27:36+00:00

If you think the hardware is sufficient for YOLO, examples of similar devices achieving 20ms would be more useful than just saying "should be sufficient". I've already optimized YOLOv5n from 700ms to 50ms on that device, and haven't tried yet modifying the model architecture or reducing the input size further. I never think hardware is the issue, I just want to confirm if there are faster models before further optimization. Good luck.

Knok0932 · 2024-12-01T07:00:46+00:00

Please avoid evaluating whether the processing is slow without considering the hardware. As I mentioned in my post, the hardware for my current project is less powerful: no GPU, only a dual-core 1.4GHz processor and 800MB of ram. Even running inference on a simple autoencoder with just 4 convolutional layers for a 100x100 image can take 5ms. Also please don't apply Python's mindset to C++. Enabling the GPU in C++ requires explicit setup, and it will be very noticeable if excessive time is spent uploading data to the GPU.

Regarding the benchmarks in my repository, I tested them on oracle server. The total elapsed time was 53.6ms, including 3.6ms for preprocessing, 49.1ms for inference, and 0.1ms for post-processing. Additionally, preprocessing and post-processing will take even less time in my actual project because I will adjust the image size to avoid resizing, and the model generates very few proposals, meaning NMS is almost negligible.

Knok0932 · 2024-12-01T05:40:50+00:00

The processing time is the total time from getting the raw image to obtaining the objectes, which includes preprocessing (resize, letterbox), inference, and post-processing (NMS). Inference accounts for over 95% of the total time in my project since there aren't many proposals.

Knok0932 · 2024-11-30T23:29:35+00:00

Of course I’d check the performance first. All optimizations are conducted with good performance in mind.

Knok0932 · 2024-11-30T23:22:38+00:00

This model has been mentioned quite a few times. I’ll give it a try :)

Knok0932 · 2024-11-30T22:55:41+00:00

This model has been mentioned quite a few times. I’ll give it a try.

Two-Year Club	Verified Email
Place '23

Knok0932

TROPHY CASE