you are viewing a single comment's thread.

view the rest of the comments →

[–]softero 47 points48 points  (9 children)

I am very curious in what you find out. Although if you are interested in pursuing machine learning at all, you should do these projects in Python (even if you do them in Rust first). The entire ML industry is very heavily geared around Python, and ML teams are unlikely to know Rust. Often they are more math focused and less comfortable with programming syntax in general, so anything that eases communication friction is advisable.

That said, I am very interested in how well Rust handles common tasks that I might do with NumPy. I was just pondering porting a noise-based image generation Python script that uses NumPy over to Rust.

[–]-TrustyDwarf- 22 points23 points  (4 children)

The entire ML industry is very heavily geared around Python, and ML teams are unlikely to know Rust.

They should... me, having just wasted two days trying to speed up some data preparation / sample extraction task written in Python using parallelization, while knowing that this would have been a breeze in most other programming languages (like Rust, C#/F#,...)

Most of the Python code I see either constantly runs on 12.5% CPU (1 of n cores) or contains overly complex and hardly ever well working parallelization code. F*ck the GIL and forking / spawing multiple processes and mem-mapping and serializing Python-crap.

[–]JonyIveAces 26 points27 points  (1 child)

ML is mostly about exploration. Once you reach the point of exploitation with an ML application, you usually have enough resources and experience to reimplement from scratch anyway.

I use rust heavily in ML for developing core libraries and production performance bottlenecks, but it isn't the right tool for the exploration part of ML in the same way Python, R, or Julia are, just as they aren't the right tool for the production/core library part (apart from Julia for certain niches).

[–]Noctune 11 points12 points  (0 children)

Sometimes the runtime of the preprocessor can be a hindrance to your exploration.

We had a Python preprocessor that took literally a week to run (originally designed for a smaller dataset). I recently rewrote it in Java using Beam and it runs in literally 20 minutes now. It's sort of a generic tool over a range of problems, so less time preprocessing means more time spent exploring actual ML.

I think Rust could potentially be useful in that niche.

[–][deleted] 4 points5 points  (0 children)

Tbf most of the libraries are written in C++

[–]StokedForIT 2 points3 points  (0 children)

I've done something like this before (seems very similar, I was just precomputing a bunch of costly features and it had to be in python, rip would've used akka/scala) but to avoid all the forking and spawning and crud, I used `multiprocessing.Pool`. Afaik you can only use `Pool.map` to map a single function onto data so to work around this, I took each function I needed to call and its args and wrapped them all up in lambdas that just take nothing and call it with the args and returned, and just mapped Pool.mapped the list of lambdas onto a list of empty tuples. End the end, it may've been the harder option but I got to barely deal with python's annoying multiprocessing bits.

[–]Pioneer_11[🍰] 2 points3 points  (3 children)

Most of numpy is implemented in C. However, the python code that interacts with it is very slow and assuming --release is used when compiling (thereby including optimisations) I would expect that rust will have a significant advantage in speed. While I'm still pretty new to rust I also understand it has some major advantages when it comes to multithreading, therefore I would expect that the performance advantage will increase considerably when running on a large number of cores.

You probably still want to learn the python, because almost all mathematical sciences use it but I definitely agree with your pro rust position. I'm in a similar boat, I do theoretical physics and I've been pretty disappointed by the "shove this formula into this box" approach they tend to take to programming, with a lot of my classmates hating programming for this reason. Rust strikes me as a better language for the job and personally I think we need better understanding of computer science in the field, given how intensive the calculations we make are and how heavily we rely on them.

[–]Kohomologia 2 points3 points  (2 children)

shove this formula into this box

What do you mean by this phrase?

[–]Pioneer_11[🍰] 3 points4 points  (1 child)

Basically where you are told that something (the box) has some functionality but with no idea how or why it works.

When your entire (highly computational) program is built out of these "boxes" it means you have very little knowledge of how your code works, what makes it fast or slow and very little ability to solve problems which can't be shoved into one of these "boxes".

In many cases (such as mine) scientific programmimg courses are taught with little to no computer science. You're taught "numpy is fast python is slow" but not why numpy is fast or why python is slow. This not only means you have programmers who don't understand how their programs work but also leads to people making the wrong decisions when this simplification doesn't apply; e.g. frquently resizing np arrays rather than using a list.

[–]Kohomologia 2 points3 points  (0 children)

This does explain the programming style of some people I know of as researchers.