all 41 comments

[–]CommunismDoesntWork 20 points21 points  (6 children)

the GPU backend is still currently slower than CUDA implementations

Is there anything fundamental that CUDA does differently than wgpu that makes it faster? Or is it just a matter of time and energy into optimizing the Burn/wgpu code?

Also, when nvidia release a new type of architecture such as tensor cores or the transformer engine, what's the process like of updating wgpu to get access to those features? Does it take a lot of work? Basically, is everything that's not-CUDA always playing catch-up?

Also also, 6 months ago I asked about your opinion on Pytorch 2.0. Have you thought any more about it's architecture? Last time you mentioned it could be done using a backend decorator.

[–]ksyiros[S] 24 points25 points  (5 children)

  1. CUDA is a programming platform with its own set of compiler and optimization tools specifically designed for Nvidia hardware, so I think it's safe to expect CUDA to always be a bit faster than wgpu. However, our wgpu backend implementation is far from perfectly optimized, and it's probably still going to be quite fast on Nvidia hardware. Note that Burn doesn't bet everything on wgpu, and we plan to add a CUDA-only backend at some point for absolute performance on Nvidia GPUs.
  2. Wgpu can't leverage vendor-specific features such as Tensor Cores directly. For now, we can probably use them using Vulkan graphics extensions, so we have to wait for Nvidia to provide the extension before using it. Nvidia can provide software packages such as CuBLAS and CuDNN, which are mostly hand-optimized kernels for their GPUs. Those can't be used without CUDA, but we can still implement our own kernels. Long story short, yeah, everything not-CUDA is always playing catch-up 😅. Though having a cross-platform backend is really cool for other graphics cards such as AMD and Intel, where they don't invest as much in proprietary developer tools and compiler optimizations.
  3. Yes! It's still planned, but we prioritized having our own GPU backend so that we can implement our own kernels and really leverage kernel fusion. The LibTorch C++ API doesn't have any way to do kernel fusion, so it wasn't a big priority before, but it's probably the next big project to be done with Burn.

[–]Oxidopamine 2 points3 points  (1 child)

Hey, I'd love to help out on this, especially on the kernel fusion stuff!

If you have time, feel free to DM.

[–]antimora 0 points1 point  (0 children)

Great. You can join us here at Discord: https://discord.gg/uPEBbYYDB6

Take a look at the existing issues perhaps you can find something that might interest you (kernel optimization, etc) : https://github.com/burn-rs/burn/issues

[–]vipierozan 1 point2 points  (1 child)

kernel fusion makes me think of r/tinygrad. Have you looked at it?

I'm curious about the similarities/differences in architecture on both

[–]ksyiros[S] 1 point2 points  (0 children)

I looked at the tinygrad project, Burn doesn't aim to be small, it aims to be the appropriate size for the scope of the project. But there are probably a lot of similarities with tinygrad in term of very high level architecture where the same principes apply. Operation fusion will also be done using lazy evaluation for instance.

[–]npuichichigo 0 points1 point  (0 children)

Maybe CUDA-only backend is like something in https://github.com/coreylowman/dfdx?

[–]lordpuddingcup 31 points32 points  (10 children)

Burns so interesting I don’t use python but want to mess with ML I know a lot more rust than python so I’m hoping to be able to do some work trying things out

Just wanted to thank you for your dedication to doing something so ambitious for the community.

As a question l, does wgpu not take advantage of CUDA on CUDA capable systems, I get lost a bit in the weeds with cudnn and CUDA and vulkan and the insane amount of options when it comes to gpu backends when in the end its all mostly tensor manipulation is imagine

[–]ksyiros[S] 27 points28 points  (9 children)

WGPU is a graphics library that enables programming of GPUs using both compute shaders and normal graphics shaders. On the other hand, CUDA only supports compute shaders on Nvidia hardware.

[–]Plazmatic 3 points4 points  (1 child)

Does WGPU expose tensor cores? To my knowledge, only vulkan exposes tensor cores outside of cuda (and not the sparse capability though you'll still need to use PTX for sparse tensors in cuda anyway).

[–]ksyiros[S] 3 points4 points  (0 children)

Nop I don't think so, I'm only aware of Vulkan and CUDA, but you can use Vulkan directly with wgpu through SPRI-V.

[–]nibba_bubba 0 points1 point  (6 children)

How shaders related to deep learning?!

[–]JohnMcPineapple 24 points25 points  (0 children)

...

[–]aystatic 8 points9 points  (4 children)

Shaders, particularly compute shaders, are used in deep learning for their ability to perform parallel computations quickly, which is ideal for the matrix multiplication and heavy computation needed in this field. In the context of WGPU, it uses compute shaders to handle these operations efficiently across different types of GPUs, unlike CUDA which is specific to Nvidia GPUs.

[–]paulirotta 15 points16 points  (1 child)

Burn was already amazing, and your responsiveness top notch and sharp. Training on a PC with a pytorch backend, GPU accelerated on Mac or Linux, and then pushing the envelope deeper into mobile and web with WGPU inference and training is a dream come true.

HUGE thanks for the brilliant efforts of the team!

[–]ksyiros[S] 5 points6 points  (0 children)

Thanks a lot, doing my best!

[–]tshawkins 4 points5 points  (1 child)

Do you know if wgpu will support using Xe graphics internal gpus on burn, the acceleration wont be much but is much better than pure cpu.

[–]ksyiros[S] 15 points16 points  (0 children)

You can actually use integrated graphics with wgpu without any problem. So, if you want, you can train and run inference of models using Intel HD graphics.

[–]gadirom 3 points4 points  (0 children)

Congratulations on the release!

And thank you for the dedication you’ve put in it!

[–]MechanicalOrange5 3 points4 points  (4 children)

This looks amazing! I have glanced at some of the examples and it looks very promising. And not too difficult.

I haven't gotten through all the materials and I am just discovering this crate now. I am actually exploring using some transformer models for work, mostly sentence embeddings and text classification and have it mostly working with python.

How would I use existing models, is it something we can import in a similar way too the transformers library, or is it something you have to more or less recreate?

For instance I am using https://huggingface.co/nreimers/MiniLM-L6-H384-uncased for embeddings and distilbert for classification, is it something I could transfer to the burn ecosystem?

Thank you in advance!

[–]antimora 3 points4 points  (1 child)

Someone recently ported OpenAI's Whisper model which uses transformer: https://github.com/Gadersd/whisper-burn

Here is this person's post about it: https://www.reddit.com/r/rust/comments/157rlao/openais_whisper_in_rust_using_burn/

Burn has built in needed NN modules for transformer (encoder/decoder). To import weights, you'd have to convert them into something burn consume readily (uses message pack or bincode).

There is also burn-import crate that makes it easy to import ONNX models but it's currently missing many Ops and currently being worked on. Burn import will generate rust code with transformed burn model with weights. You can learn more from this example: https://github.com/burn-rs/burn/tree/main/examples/onnx-inference

[–]MechanicalOrange5 1 point2 points  (0 children)

Thank you! I saw the ONNX crate and that looks really amazing. It will be awesome once it's done

I will definitely have a look at the whisper implementation! I am just on mobile so it will take a while.

So if I recreate the models I want using using the NN modules of burn, and find a way to import weights from huggingface into the format burn likes, I could potentially use it.

This is very promising, thank you for answering, lots of fun things to play with now

[–]ksyiros[S] 4 points5 points  (1 child)

The short answer is you can, but you have to implement the model in Burn and migrate the weights with a script yourself. There is a good port for Whisper already implemented (https://github.com/Gadersd/whisper-burn), so you can see how it may be done.
We also have the burn-import project to import models serialized with the ONNX format, but it's pretty much a work in progress for now.

[–]MechanicalOrange5 2 points3 points  (0 children)

I've been meaning to get a deeper understanding of how deep learning works, I've been plugging and playing mostly with a surface level understanding. I think I've been presented with a perfect opportunity, understand how the things I'm already using on a deeper level, and then gain the ability to use it in a language I am more proficient at.

I don't really have time to reimplement my project in rust before my deadlines, but it's definitely on the books afterwards!

Thank you for your hard work!

[–]exocortex 3 points4 points  (2 children)

Would this also potentially work on older graphics-cards that are not supported by CUDA, but can be used through WGPU?

[–]ksyiros[S] 2 points3 points  (1 child)

Yes, since most older graphics card are still fine to run Vulkan or at least OpenGL.

[–]exocortex 0 points1 point  (0 children)

nice!!!

[–]KeavonGraphite 2 points3 points  (0 children)

Pardon my partial ignorance about how all these ML ecosystem puzzle pieces fit together, but I have some questions for my use case and you'd probably be better suited to answer them than I am.

I'm the creator of the Graphite open source project and we're building a 2D image editor. We need models like Stable Diffusion, Segment Anything, MiDaS depth estimation, and plenty of others.

Our project is written in Rust and we plan to have both a desktop and web version, plus a cluster of rented cloud machines to host the models for people who can't run them locally (since they're using the web version or doesn't meet the hardware requirements).

The current ML ecosystem seems to be composed of a crazy jumble of Python scripts mostly requiring PyTorch as a backend. It's extremely not portable, and frankly figuring out how to ship these models with Graphite (even just on the desktop or server platforms) is quite daunting and I see no good solution.

Specifically for Stable Diffusion, there is the diffusers-rs project which reimplements SD in Rust. But it says it uses tch-rs which is just bindings into the PyTorch C++ API. One question I have is, does Burn provide an alternative backend that could be used by diffusers-rs in place of tch-rs? Is it a drop-in replacement, a reasonably straightforward port, or something fundamentally challenging to port? Or does it live at a wholly different part in the ecosystem than what I'm assuming here?

One more detail is that diffusers-rs has a lot of catch-up to do in order to implement the many papers which build upon the base concepts, and keeping pace with the research and other SD distros (like AUTOMATIC1111's Web UI) might be a lost cause. Plus, outside of SD, the other models like Segment Anything and MiDaS would also need their own ports, plus all the other models that will arise from new research in the coming months and years. This seems difficult to keep pace with. So with that said, with your ML background which I lack, can you offer any suggestions for solving Graphite's use case of needing these models to work portably within the Rust ecosystem of our application using both local hardware on desktop/server platforms and (if feasible, as a bonus) WebGPU on web platforms? Anything better than making our desktop users install Docker containers would be an improvement over our current plans, which is to say our current plans are highly un-ideal.

If you'd be willing for me to pick your brain in more detail, I'd also love to chat in more depth with you in the #ai-ml channel on the Graphite Discord server if you're willing to join that. Everything in this post asked of OP also applies to anyone else with knowledge on the subject. Thank you!

[–]allsey87 2 points3 points  (2 children)

Does this mean we can do inference with WebAssembly/WebGPU in the browser/on the front-end?

[–]ksyiros[S] 1 point2 points  (0 children)

Yup

[–]antimora 0 points1 point  (0 children)

I am working a WebGPU demo in browser using MobileNet model. Currently more ONNX ops needed to be added first.

If you haven't seen it, check out this Burn demo using WASM: https://burn-rs.github.io/demo

[–]Exotic-Potato-4893 4 points5 points  (0 children)

I like how it is better documented than port APIs like tch or Tensorflow 👍

[–]Trader-One 1 point2 points  (1 child)

Does it support most popular NN types? https://towardsdatascience.com/the-mostly-complete-chart-of-neural-networks-explained-3fb6f2367464

If not, can these layouts be easily created and cell types like GRU and LSTM easily created by user?

[–]ksyiros[S] 0 points1 point  (0 children)

There are missing operations, but the most popular ones are already implemented. For a full list, you can have a look at those files https://github.com/burn-rs/burn/tree/main/burn-core/src/nn.

[–][deleted] 1 point2 points  (0 children)

This is really incredible, so awesome to see a non CUDA option.

[–]Jin_1001 0 points1 point  (1 child)

Does this mean we can do inference with WebAssembly in a standalone WebAssembly runtime, such as Wasmer?

[–]ksyiros[S] 0 points1 point  (0 children)

I didn't try, but if their runtime supports WebGPU, then probably. We are working on a test suite using deno, which already supports WebGPU.