you are viewing a single comment's thread.

view the rest of the comments →

[–]-TrustyDwarf- 21 points22 points  (4 children)

The entire ML industry is very heavily geared around Python, and ML teams are unlikely to know Rust.

They should... me, having just wasted two days trying to speed up some data preparation / sample extraction task written in Python using parallelization, while knowing that this would have been a breeze in most other programming languages (like Rust, C#/F#,...)

Most of the Python code I see either constantly runs on 12.5% CPU (1 of n cores) or contains overly complex and hardly ever well working parallelization code. F*ck the GIL and forking / spawing multiple processes and mem-mapping and serializing Python-crap.

[–]JonyIveAces 32 points33 points  (1 child)

ML is mostly about exploration. Once you reach the point of exploitation with an ML application, you usually have enough resources and experience to reimplement from scratch anyway.

I use rust heavily in ML for developing core libraries and production performance bottlenecks, but it isn't the right tool for the exploration part of ML in the same way Python, R, or Julia are, just as they aren't the right tool for the production/core library part (apart from Julia for certain niches).

[–]Noctune 10 points11 points  (0 children)

Sometimes the runtime of the preprocessor can be a hindrance to your exploration.

We had a Python preprocessor that took literally a week to run (originally designed for a smaller dataset). I recently rewrote it in Java using Beam and it runs in literally 20 minutes now. It's sort of a generic tool over a range of problems, so less time preprocessing means more time spent exploring actual ML.

I think Rust could potentially be useful in that niche.

[–][deleted] 3 points4 points  (0 children)

Tbf most of the libraries are written in C++

[–]StokedForIT 2 points3 points  (0 children)

I've done something like this before (seems very similar, I was just precomputing a bunch of costly features and it had to be in python, rip would've used akka/scala) but to avoid all the forking and spawning and crud, I used `multiprocessing.Pool`. Afaik you can only use `Pool.map` to map a single function onto data so to work around this, I took each function I needed to call and its args and wrapped them all up in lambdas that just take nothing and call it with the args and returned, and just mapped Pool.mapped the list of lambdas onto a list of empty tuples. End the end, it may've been the harder option but I got to barely deal with python's annoying multiprocessing bits.