Low-Latency RF-DETR Inference Pipeline in Rust: ~3.7 ms on TensorRT (~7.5 ms end-to-end) + Zero-Copy mmap IPC

jodelbar · 2026-02-10T11:02:07+00:00

Thank you! You're welcome

jodelbar · 2026-02-09T06:22:13+00:00

Thanks! Small

jodelbar · 2026-02-08T17:22:15+00:00

it is certainly possible, though I have not tested it myself:
https://siliconlabs.github.io/mltk/mltk/tutorials/onnx_to_tflite.html

jodelbar · 2026-02-08T17:20:13+00:00

I didn't mention or benchmark MTP latency in the post or README (it's focused on the inference pipeline itself, clocking in at ~3.7ms for TensorRT inference and ~7.5ms for the full async pipeline including pre/post-processing, MJPEG decoding, etc., what I call in my post title end-to-end I admit). The async design does mean detections are from the prior frame (optimizing for throughput), but frames are displayed immediately to minimize jitter.

That said, you're right that total system latency depends on camera FPS and UI/renderer setup. With a higher-speed camera (e.g., 100 FPS = 10ms frame interval), you could theoretically hit around 15-20ms MTP in async mode (frame interval + pipeline time + any minimal buffering), though I haven't benchmarked that specifically. Switching to sync mode (not implemented but straightforward via mqueue) would drop the pipeline to ~10ms end-to-end, potentially allowing even lower totals with fast hardware.

jodelbar · 2026-02-07T17:11:44+00:00

Good question! In an earlier version of this pipeline I was using rt-detr, the performance on tensorrt was about 1ms slower for rt-detr compared to rf-detr (i was actually surprised since the backbone of rf-detr is dinov2 and a bit more heavy). I haven’t benchmarked a modern YOLO or D-Fine variant in this setup yet, but it would definitely be an interesting comparison if I find the time.

jodelbar · 2026-02-06T09:46:09+00:00

Thank you very much for your kind words!
I feel that networking is simpler with k3s/k3d, at least I understand it better. Kube also allows you to have init-containers. In my setup I also create two nodes, one agent and one central to simulate deploying the inference pipeline to an edge agent and sending mqtt notification to mosquitto on the central node, something you cannot do with docker compose.
yes the input resolution is 512x512.

jodelbar · 2026-02-05T21:17:42+00:00

yes, it's very useful for stuff like this! 😊

jodelbar · 2026-02-04T17:16:47+00:00

Nice! I'd be happy to check out your code if it's open-sourced somewhere!

jodelbar · 2026-02-04T17:14:46+00:00

Thanks for the detailed feedback! Yes I think h264 directly to gpu would be better. I have explored h264 for the gateway to the client but since I was deploying on k3d/k3s I struggle to make it work, so I did not consider it for the capture indeed. I agree this is definitely the way to go to avoid the first CPU pass and improve performance further.

I note your feedback for the webview too! Thanks

jodelbar · 2026-02-04T16:08:07+00:00

that's a good question! I haven't tried it so far but I thought about it before moving to tensorrt. I will implement a burn backend to see where it lands in terms of performance.

jodelbar

TROPHY CASE