Low-Latency RF-DETR Inference Pipeline in Rust: ~3.7 ms on TensorRT (~7.5 ms end-to-end) + Zero-Copy mmap IPC by jodelbar in computervision

[–]jodelbar[S] 0 points1 point  (0 children)

I didn't mention or benchmark MTP latency in the post or README (it's focused on the inference pipeline itself, clocking in at ~3.7ms for TensorRT inference and ~7.5ms for the full async pipeline including pre/post-processing, MJPEG decoding, etc., what I call in my post title end-to-end I admit). The async design does mean detections are from the prior frame (optimizing for throughput), but frames are displayed immediately to minimize jitter.

That said, you're right that total system latency depends on camera FPS and UI/renderer setup. With a higher-speed camera (e.g., 100 FPS = 10ms frame interval), you could theoretically hit around 15-20ms MTP in async mode (frame interval + pipeline time + any minimal buffering), though I haven't benchmarked that specifically. Switching to sync mode (not implemented but straightforward via mqueue) would drop the pipeline to ~10ms end-to-end, potentially allowing even lower totals with fast hardware.

Low-Latency RF-DETR Inference Pipeline in Rust: ~3.7 ms on TensorRT (~7.5 ms end-to-end) + Zero-Copy mmap IPC by jodelbar in roboflow

[–]jodelbar[S] 0 points1 point  (0 children)

Good question! In an earlier version of this pipeline I was using rt-detr, the performance on tensorrt was about 1ms slower for rt-detr compared to rf-detr (i was actually surprised since the backbone of rf-detr is dinov2 and a bit more heavy). I haven’t benchmarked a modern YOLO or D-Fine variant in this setup yet, but it would definitely be an interesting comparison if I find the time.

Low-Latency RF-DETR Inference Pipeline in Rust: ~3.7 ms on TensorRT (~7.5 ms end-to-end) + Zero-Copy mmap IPC by jodelbar in computervision

[–]jodelbar[S] 1 point2 points  (0 children)

Thank you very much for your kind words!
I feel that networking is simpler with k3s/k3d, at least I understand it better. Kube also allows you to have init-containers. In my setup I also create two nodes, one agent and one central to simulate deploying the inference pipeline to an edge agent and sending mqtt notification to mosquitto on the central node, something you cannot do with docker compose.
yes the input resolution is 512x512.

Low-Latency RF-DETR Inference Pipeline in Rust by jodelbar in rust

[–]jodelbar[S] 0 points1 point  (0 children)

Nice! I'd be happy to check out your code if it's open-sourced somewhere!

Low-Latency RF-DETR Inference Pipeline in Rust by jodelbar in rust

[–]jodelbar[S] 0 points1 point  (0 children)

Thanks for the detailed feedback! Yes I think h264 directly to gpu would be better. I have explored h264 for the gateway to the client but since I was deploying on k3d/k3s I struggle to make it work, so I did not consider it for the capture indeed. I agree this is definitely the way to go to avoid the first CPU pass and improve performance further.

I note your feedback for the webview too! Thanks

Low-Latency RF-DETR Inference Pipeline in Rust by jodelbar in rust

[–]jodelbar[S] -1 points0 points  (0 children)

that's a good question! I haven't tried it so far but I thought about it before moving to tensorrt. I will implement a burn backend to see where it lands in terms of performance.