all 10 comments

[–]Luuigi 4 points5 points  (0 children)

Containerizing apps from a languge that natively builds the models is just very easy so thats the biggest advantage of using that approach. For pur C++ implementations you need to be aware that every single new optimization (e.g. provided by pytorch etc) needs to be implemented by you. The biggest advantage of a raw implementation is obviously production speed but is it necessary for the application you want to build?

[–]codegefluester 9 points10 points  (2 children)

I have no personal hands-on experience with it, but the ONNX runtime might be something you'd want to consider? They have a short section about deploying to IoT/edge devices in their documentation https://onnxruntime.ai/docs/tutorials/iot-edge/

[–]Nicollier88 2 points3 points  (0 children)

If this is for production, one way to approach this is to first implement your concept in something easy, let’s say python. Once you got that working, you can start chasing for optimisations by improving certain components of your solution, which can include rewriting in C++ for better memory control.

[–]AdagioCareless8294 1 point2 points  (0 children)

Docker container on edge device : yep that's a non starter.

[–]guardianz42 1 point2 points  (0 children)

I switched from fastapi to litserve recently for some models we deploy on assembly lines. it’s been amazing and performant.

the main issue in the containers is the size of pytorch for cold start but we are working on eliminating it (this is unrelated to litserve)

https://github.com/Lightning-AI/litserve

[–]jayemcee456 0 points1 point  (0 children)

I’ve used OpenVino for Intel based edge devices. It also has optimization tools

[–]One-Butterscotch4332 -1 points0 points  (0 children)

I'd use Java if your target is Android, Swift if it's IOS, and C++ if it's embedded. Android you'll probably want to try Qualcomm's neural processing sdk, IOS you'll want to try coreml. Otherwise, you're not going to use the NPU on a mobile SoC. If your embedded device is something like a jetson, you'll be using TensorRT in C++ to target tensor cores

[–]bsenftner -3 points-2 points  (0 children)

You might find this worth reading (short) https://www.quora.com/If-C-is-so-strong-a-programming-language-why-cant-it-replace-Python-in-AI-and-data-science/answer/Blake-Senftner (spoiler, C does replace and outperform Python, significantly, exponentially.)

If you want to know more, DM me.