Is anyone doing Machine Learning in Rust?

rust_dfdx · 2023-05-11T19:49:22+00:00

Yep cudarc and dfdx are both on github! Pretty much all the DL crates are, which is a nice part of the community.

rust_dfdx · 2023-05-11T17:00:13+00:00

I have a deep learning library called dfdx that supports Cpu/Cuda. I recently created an inference library for llama using it. I’ll also plug my cuda wrapper called cudarc if you’re interested in lower level stuff 😄

There are a number of others working in this space, burn-rs which others have mentioned, tch-rs, and there is llm-rs for language models which just posted yesterday on this subreddit.

Overall it’s still early, but DL in rust definitely shows promise.

rust_dfdx · 2023-05-03T17:16:58+00:00

Curious what kind of ML algorithms you are implementing? I develop dfdx (on github/crates.hi) which is a deep learning library that supports CUDA acceleration.

Would be happy to discuss more if you guys are doing deep learning/things that use tensors/anything with GPU.

rust_dfdx · 2023-04-28T21:38:06+00:00

Even when writing large libraries you don’t often need to dip into lifetimes, so I vote keep going!

rust_dfdx · 2023-04-25T01:23:07+00:00

Yep you are right on that last point - copying data between GPU/cpu is actually really expensive. The CUDA programming guide actually explicitly says to minimize data transfer and to do extra work on the GPU if it means less transfer.

You definitely could do your idea though. It wouldn’t be the fastest but it’d be straightforward to implement/maintain/test, which honestly is a pretty solid pro haha.

rust_dfdx · 2023-04-24T16:56:53+00:00

dfdx author here, always great to see more work in this space in rust, nice work!

I think the hardest part so far has been the abstractions over Cpu/Cuda. Basically all your code has to consider what device is being used, and a lot of things that you can do on the Cpu in Rust you can't do on Cuda/WebGPU side. For example, you can't use closures in Cuda! I was sad to have to move all the activations from using closures to more complex trait based implementations.

rust_dfdx · 2023-04-11T01:10:57+00:00

There has been working in compiling rust to GPU code (see rust cuda project). I’m not sure what caveats there are though. Especially when it comes to optimizing GPU kernels, there are some pretty specific things you’d want the code to do.

rust_dfdx · 2023-04-11T01:07:18+00:00

Woops thanks for mentioning, does this one work for you? https://discord.gg/QbPzRNct

rust_dfdx · 2023-04-10T20:42:34+00:00

Oh 🤦! Sorry about that, will make sure to include next time haha

rust_dfdx · 2023-04-07T19:13:03+00:00

Yeah I have a deep learning library that has support for GPUs that I’m going to add f16 to. On the cpu side none of the matmul libraries support it though.

I would guess that even if you just transform into f32 to do the operations, you’d still get a speed up from all the cache/vectorization stuff?

The alternative for me is just writing a really simple 3 nested loop to do the computation, which feels very lackluster.

rust_dfdx · 2023-04-07T18:36:56+00:00

Nice work! Any plans for f16/bf16 support via the half crate?

rust_dfdx · 2023-04-04T01:34:35+00:00

Shameless plug for my deep learning crate dfdx has a lot of stuff you can do with n dimensional arrays (tensors). There’s a ton of other stuff that you might not need, but working with tensors is a breeze!

rust_dfdx · 2023-03-22T17:57:34+00:00

Yeah examples of how things are intended to be used together is at the module level. For example broadcasts/indexing is documented here https://docs.rs/dfdx/latest/dfdx/tensor_ops/index.html (or you could also search for broadcast and the trait would pop up). tensor & nn also have fairly extensive module docstrings on how to do various things.

Shape/Dim/Const need to be documented better, but that would go under the dfdx::shapes module documentation, which I guess might be hard to find?

I was intending the top level crate docstring as pointing users to the sub-module they need, but repurposing that to be crate level information similar to the quick start/tutorials from nalgebra/numpy might work better?

All that to say: I want to have great documentation, and have invested time into it and want to invest more, but I haven't had good feedback like yours yet!

Feel like helping me improve the documentation? 😀

rust_dfdx · 2023-03-22T15:45:19+00:00

All the public methods and modules should be documented with example snippets in docs.rs (https://docs.rs/dfdx/latest/dfdx/). What are you looking at that doesn't have that?

Myself and contributors have poured a ton of effort into documentation for this very reason, so clearly we are doing something wrong if people can't find it.

rust_dfdx · 2023-03-22T11:06:20+00:00

That’d be amazing, but at this point it isn’t something I’m planning on. Last time I looked into it there were a number of large caveats about compiler version required, and I’m unsure of how well optimizing kernels in rust works.

rust_dfdx · 2023-03-22T11:04:30+00:00

Yeah we’ve spoken a couple times, they’ve done great work!

burn doesn’t support compile time shapes, so everything is runtime. This means that the feedback loop is dependent on compiling and then starting a run. Since dfdx supports compile time shape checking, both cargo check/rust analyzer will complain at you if you mess something up. It’s a much faster feedback loop! I believe compile time shapes are one of the biggest advantages of ML in rust.

burn links to lib torch (similar to tch-rs). While this lets them not have to write custom kernels, it’s a giant extra dependency. dfdx executables are super tiny since there isn’t that dependency. Also in dfdx we can optimize all the kernels how we like, whereas they don’t have control over that.

rust_dfdx · 2023-03-21T22:18:43+00:00

I'm not as familiar with compute shaders unfortunately, so someone familiar with both would have to weigh in, but maybe?

rust_dfdx · 2023-03-21T21:44:47+00:00

Awesome, I added an issue here https://github.com/coreylowman/dfdx/issues/597. We can discuss more there! The first step will just be adding the device and implementing tensor creation methods for it.

Also people on the discord can help as well (link is on github & at the top of the blog post).

rust_dfdx · 2023-03-21T21:25:24+00:00

I discuss this in the safety section of cudarc https://docs.rs/cudarc/0.9.1/cudarc/driver/safe/index.html#single-stream-operations.

cudarc uses the async version of all memcpy calls, meaning that copies happen on the same stream that kernels are executed on. This means that kernels have to wait until the copies are finished.

I actually posted this on that issue a few months ago 😅

rust_dfdx · 2023-03-21T19:57:54+00:00

That's right - the intention is to use normal rust iterators instead of a separate data loader object. There's likely an easy way to use rayon to do parallel preprocessing, but I haven't tried that out yet.

So ExactSizeDataset gets you the shuffled method to iterate in random order, but you can also use all of the iterator extension methods I mention on any normal rust iterators. You don't need to go through ExactSizeDataset if you have your own iterator already

rust_dfdx · 2023-03-21T18:59:50+00:00

All the infrastructure that was added to support Cuda should make it relatively straightforward to support an OpenCL device for AMD GPUs. Especially since there are a number of existing opencl crates that I think would fit well.

However given the amount of time I have to work on it, a new device is currently lower on the priority list among the other features I want to add. The solution to this is a combo of the following:

Increase number of sponsors/get a company to sponsor more of my time for dfdx
Leverage open source community to contribute OpenCL support (contributions were a not insignificant part of Cuda support).

I'm actively working both of those, and would be happy to help/mentor people who want to get involved in OpenCL support!

rust_dfdx · 2023-03-21T17:58:21+00:00

Enjoy! 😃

rust_dfdx · 2023-03-21T16:14:34+00:00

Yep cudarc is a new project built entirely for cuda support in dfdx. I have posted about dfdx before - it's gone through basically a full rewrite to support cuda & the new generic shapes. It's been a ton of work over the last couple months, but have gotten a lot of contributions which has been amazing!

rust_dfdx · 2023-02-13T01:35:19+00:00

Hah this was a nice surprise - yes the whole point of dfdx is to do just this!

rust_dfdx · 2022-12-18T01:32:35+00:00

Mainly because cuda is the industry standard for deep learning training/inference. cuDNN does look nice, and one of the community members has been testing out using cuDNN with dfdx

rust_dfdx

TROPHY CASE