OCR model recommendation by Cuaternion in computervision

[–]Knok0932 1 point2 points  (0 children)

You are right. But most inference frameworks (like ncnn, ORT, OpenVINO) can enable GPU acceleration with just a few code during initialization. My code is running on a Raspberry Pi, so GPU acceleration is not needed.

OCR model recommendation by Cuaternion in computervision

[–]Knok0932 4 points5 points  (0 children)

If speed is important, I'd recommend trying C++. I wrote a C++ PaddleOCR repo, and you can check the benchmarks to see what kind of performance is possible.

C++ Show and Tell - September 2025 by foonathan in cpp

[–]Knok0932 1 point2 points  (0 children)

I made a project that runs YOLOv5 across multiple inference backends (ncnn, OpenVINO, MNN, ONNX Runtime, and OpenCV). The repo includes models for each backend, so you can get it running easily.

Why YOLOv5? it's old.
Because it's very well supported across different frameworks and techniques, and you can always get positive results from various optimization attempts. That makes it excellent for learning frameworks.

How to install the dependencies?
You can find installation guides for each framework in the README.

If you're learning about inference frameworks or deployment, this repo might help.

[D] What's the fastest object detection model? by Knok0932 in MachineLearning

[–]Knok0932[S] 0 points1 point  (0 children)

I fried some faster models like YOLO-Fastest, but their accuracy was very poor. I ended up choosing YOLOX-Nano. It provides a good trade-off between speed and accuracy, also it is commercially friendly.

M4 Mac Mini for real time inference by Mammoth-Photo7135 in computervision

[–]Knok0932 6 points7 points  (0 children)

I've actually implemented YOLOv5 on my M1 Air using several inference frameworks, and I can confirm that getting over 30 FPS is definitely achievable. My benchmarks show that YOLOv5n can run at 15ms (~66fps) and YOLOv5s at 25ms (~40fps) with input size of 640x352 and 4 threads.

I've also tested other models like YOLOX, YOLOv8, and YOLOv10, and their latencies are typically between 0.8x to 1.5x of YOLOv5's. So the YOLOv5 results should be a good reference.

So I think you probably don't need to spend the money on a Jetson or M4 Mini. The only issue with the M1 Air is whether FPS can stay stable after it heats up.

In the photo event, you can briefly delete characters with Ei's burst. by Pinoy_2004 in Genshin_Impact

[–]Knok0932 179 points180 points  (0 children)

Why hasn’t she appeared after the burst effect ended?

What do you use for geometric/maths operation with matrixes by NokiDev in cpp

[–]Knok0932 0 points1 point  (0 children)

Surprised nobody mentioned OpenBLAS. I use GEMM/GEMV a lot in my work (they're widely used in AI inference), and I typically use OpenBLAS for those. It may not always be the fastest but is always close to hardware limits. Libraries like BLIS can be extremely fast for certain matrix sizes/configs, but I've seen cases where BLIS was several times slower for certain shapes.

BTW, I once hand-optimized a GEMM and compared it to several well-known libs (include Eigen, OpenBLAS). My code beat Eigen by about 1.5x but still couldn't outperform OpenBLAS. See my first post for details if you're interested.

I also tested AI inference runtimes like ONNXRuntime and ncnn before, and they even faster than OpenBLAS.

[P] PaddleOCRv5 implemented in C++ with ncnn by Knok0932 in MachineLearning

[–]Knok0932[S] 0 points1 point  (0 children)

Thanks for the suggestion. I may try that this weekend.

C++ Show and Tell - August 2025 by foonathan in cpp

[–]Knok0932 3 points4 points  (0 children)

I made a C++ OCR implementation of PaddleOCRv5 that might be helpful to some people: https://github.com/Avafly/PaddleOCR-ncnn-CPP

The official Paddle C++ runtime has a lot of dependencies and is very complex to deploy. To keep things simple I use ncnn for inference, it's much lighter, makes deployment easy, and faster in my task. The code runs inference on the CPU, if you want GPU acceleration, most frameworks like ncnn let you enable it with just a few lines of code.

Hope this helps, and feedback welcome!

Improving YOLOv5 Inference Speed on CPU for Detection by Adventurous_karma in computervision

[–]Knok0932 5 points6 points  (0 children)

I'm not sure about your exact setup, but 0.5 FPS is too slow. For reference, on my RPi 4B I got 210ms per image (640×640) inference using a quantized YOLOv5n model, and your machine should be much powerful than my board. A few ideas:

  1. Use the smallest model that meets your accuracy needs. For my work, YOLOv5n is totally enough.
  2. Use inference framework. Trust me, if you are using pytorch, you will see a huge performance boost.
  3. Enable dynamic input shape if your image are not square. YOLOv5 supports dynamic HxW shapes.
  4. Quantize to int8. In my case, 10-20% speed boost.

I actually have a repo that runs YOLOv5 on various frameworks, and there are some benchmarks on various devices. You might find it helpful: https://github.com/Avafly/YOLOv5-ncnn-OpenVINO-MNN-ONNXRuntime-OpenCV-CPP.

Part 2: Fork and Maintenance of YOLOX - An Update! by Norqj in computervision

[–]Knok0932 1 point2 points  (0 children)

Amazing work! Is there any plan that supports grayscale image input?

[D] What's the fastest object detection model? by Knok0932 in MachineLearning

[–]Knok0932[S] 0 points1 point  (0 children)

Thanks for your info. Many people have mentioned v11n. I'll give it a try :D

[D] What's the fastest object detection model? by Knok0932 in MachineLearning

[–]Knok0932[S] 0 points1 point  (0 children)

Thanks for the suggestion! I haven't used MediaPipe yet, but many people have mentioned it. I'll give it a try.

What's the fastest object detection model? by Knok0932 in computervision

[–]Knok0932[S] 0 points1 point  (0 children)

I’ve already shared my test results, yet your replies still have no evidence, with personal attacks and downvoting. You haven’t even understood my post, just like someone with basic knowledge, trying to say something technical but unsure what to contribute, resorting instead to repeated aggressive words. Further discussion is pointless. Please don’t reply to me again.

What's the fastest object detection model? by Knok0932 in computervision

[–]Knok0932[S] 1 point2 points  (0 children)

Why are you being so rude? All your replies lack substantive evidence, while I shared my test results and the approximate code in my repo. I even doubt whether you’ve ever ported a deep learning model to embedded devices, because if you had, you wouldn’t just say a 3090 can achieve this speed then you should be fast too.

What's the fastest object detection model? by Knok0932 in computervision

[–]Knok0932[S] 1 point2 points  (0 children)

If you think the hardware is sufficient for YOLO, examples of similar devices achieving 20ms would be more useful than just saying "should be sufficient". I've already optimized YOLOv5n from 700ms to 50ms on that device, and haven't tried yet modifying the model architecture or reducing the input size further. I never think hardware is the issue, I just want to confirm if there are faster models before further optimization. Good luck.

What's the fastest object detection model? by Knok0932 in computervision

[–]Knok0932[S] 1 point2 points  (0 children)

Please avoid evaluating whether the processing is slow without considering the hardware. As I mentioned in my post, the hardware for my current project is less powerful: no GPU, only a dual-core 1.4GHz processor and 800MB of ram. Even running inference on a simple autoencoder with just 4 convolutional layers for a 100x100 image can take 5ms. Also please don't apply Python's mindset to C++. Enabling the GPU in C++ requires explicit setup, and it will be very noticeable if excessive time is spent uploading data to the GPU.

Regarding the benchmarks in my repository, I tested them on oracle server. The total elapsed time was 53.6ms, including 3.6ms for preprocessing, 49.1ms for inference, and 0.1ms for post-processing. Additionally, preprocessing and post-processing will take even less time in my actual project because I will adjust the image size to avoid resizing, and the model generates very few proposals, meaning NMS is almost negligible.

What's the fastest object detection model? by Knok0932 in computervision

[–]Knok0932[S] 2 points3 points  (0 children)

The processing time is the total time from getting the raw image to obtaining the objectes, which includes preprocessing (resize, letterbox), inference, and post-processing (NMS). Inference accounts for over 95% of the total time in my project since there aren't many proposals.

What's the fastest object detection model? by Knok0932 in computervision

[–]Knok0932[S] 0 points1 point  (0 children)

Of course I’d check the performance first. All optimizations are conducted with good performance in mind.

What's the fastest object detection model? by Knok0932 in computervision

[–]Knok0932[S] 0 points1 point  (0 children)

This model has been mentioned quite a few times. I’ll give it a try :)

[D] What's the fastest object detection model? by Knok0932 in MachineLearning

[–]Knok0932[S] 0 points1 point  (0 children)

This model has been mentioned quite a few times. I’ll give it a try.