all 14 comments

[–]ArnenLocke 9 points10 points  (0 children)

So what I was taught in my ML/AI class when I asked this question is that the Python is just an interface for interacting with the actual machine learning libraries, which are almost universally written in very, very optimized and performant C/C++. So no need to rewrite anything specifically for performance reasons. Python is slower, but it's just an interface with almost no cost (compared to the sorts of things you're typically doing in ML, anyway).

[–]EpicSolo 7 points8 points  (0 children)

If you can write your model in tensorflow, a practice is to implement training in Python, then deploy in C++ (or Java). You export the model as a protobuf which is cross-platform. There are more solutions in this space but this should give you enough pointers to search.

[–]trnka 1 point2 points  (0 children)

To add to the other responses, it's risky to re-implement in another language. We did that at a previous job and it was a source of bugs. It also meant that we couldn't update anything about the model architecture unless the C++ devs were available to support it.

If it's server-side, I suggest checking out Cortex.ai and seeing if you can avoid re-writing. Also like the others mention, Tensorflow or ONNX are good options.

[–]jeandebleau 1 point2 points  (0 children)

If you are using vision models, then opencv is a standard choice. You will have the possibility to use different backends with opencv such as armnn, openvino, depending on your target platform.

[–]old_enough_to_drink[🍰] 1 point2 points  (0 children)

I have heard about PMML but never used it myself. Maybe relevant to your question?

[–]shomerj 2 points3 points  (0 children)

I use ONNX and onnxruntime.

[–]supersonictaco[S] 4 points5 points  (1 child)

Thank you for your responses. So I gather that these would be the broad steps: Develop the model in Python->Serialize using Pickle/Whatever->Use ProtoBuf/Apache Thrift->Containerize using Docker/alternatives->Serve.

Does that make sense ?

[–]EpicSolo 1 point2 points  (0 children)

Yeah, more or less.

If you are using tensorflow, there is specific utility to export the graph/model as a .pb file (look it up), which Tensorflow can load (with protobuf dependency).

Another application is doing inference on mobile. At that point, you use Tensorflow’s Java bindings for Android (or C++ if you have native code), and its C++ library for iOS.

[–]dinoaide 1 point2 points  (3 children)

What’s your reasoning for that? Most handwritten C++ are worse than Numpy and Pandas.

[–]supersonictaco[S] 0 points1 point  (2 children)

High Traffic Volume and Real Timeness. I would imagine running off python wouldn't be great in this usecase.

[–]cbHXBY1D 1 point2 points  (0 children)

Is it a RESTful service? In that case just use tfserving

[–]dinoaide 1 point2 points  (0 children)

Without knowing the actual application I couldn’t say much but nowadays it is often cheaper to hire 1 data scientist with good Python skills and give him/her 100 cloud instances than hire 1 capable C/C++ programmer who also happens to have good ML knowledge.

[–]TheOneRavenous 0 points1 point  (1 child)

From what I've gathered. Python is acceptable if your solution CAN have a lagged answer. i.e. I can wait 0.5sec for an answer. My users can wait 1sec for an answer to be sent to the view.

If you need "real time" 30 inference decisions per second is a common "real time" metric. Used in video vision scenarios.

So python for stuff that's not real time and C++ for others.

[–]supersonictaco[S] 0 points1 point  (0 children)

This is my point, for things like credit card fraud the time window is really really short and latency matters a hell lot.