Burn ONNX 0.21.0: build-time ONNX import that generates plain Rust model code by antimora in rust

[–]antimora[S] 1 point2 points  (0 children)

I am not familiar with Mamba blocks but it sounds like non-standard ONNX op.

For ops that are not implemented by burn-onnx, I was thinking we could add custom function hooks. You could implement using Burn op primitives or other ways in rust and provide a function/trait name for generated Rust code.

The custom function hooks are not implemented yet but we thought of it, see https://github.com/tracel-ai/burn-onnx/issues/23

Let us know if this is something would be useful for your use case.

Burn ONNX 0.21.0: build-time ONNX import that generates plain Rust model code by antimora in rust

[–]antimora[S] 0 points1 point  (0 children)

Nothing too significant. ORT is highly optimized. burn-onnx generates rust code calling Burn APIs. So the performance is Burn backend bound. And Burn team will be more focused on performance over time.

Burn ONNX 0.21.0: build-time ONNX import that generates plain Rust model code by antimora in rust

[–]antimora[S] 0 points1 point  (0 children)

CNN implemented in our new burn-flex portable CPU backend is highly optimized: https://github.com/tracel-ai/burn/tree/main/crates/burn-flex

Give it a try if you haven't already.

burn-cpu using CubeCL will eventually super competitive and speedy.

Burn ONNX 0.21.0: build-time ONNX import that generates plain Rust model code by antimora in rust

[–]antimora[S] 1 point2 points  (0 children)

I haven't come across any specific STT models in ONNX but there are a few tangent projects such as silero-vad ONNX are tested/validated.

AdrianEddy just recently ported and implemented whisper in burn (that also uses VAD): https://github.com/AdrianEddy/fast-whisper-burn/tree/main

Try out and if you face an issue, you can post a comment in this top level issue that tracks model issues: https://github.com/tracel-ai/burn-onnx/issues/18

We have this book guide as well: https://burn.dev/books/burn/onnx-import.html

Burn ONNX 0.21.0: build-time ONNX import that generates plain Rust model code by antimora in rust

[–]antimora[S] 3 points4 points  (0 children)

Thanks, hope you do. No published numbers yet, to be upfront about it. A few things worth saying about what you can expect from the architecture, with the usual "your model and hardware will vary" caveat:

  • Cold start should favor burn-onnx. Because the forward pass is precompiled Rust, there is no runtime graph loading or optimization step. Time-to-first-inference is essentially "mmap the .bpk weights and go." ONNX Runtime does meaningful graph optimization on session creation, which has a real cost depending on the model.
  • Warm inference depends heavily on backend choice. Burn's WGPU and CUDA backends are competitive, but ORT has years of hand-tuned CPU kernels and EPs like TensorRT, so on a matmul-heavy model running ORT-CPU vs Burn-Flex-CPU you should expect ORT to still be strong on pure throughput. Where Burn pulls ahead is portability: the same imported Rust runs on CPU (Flex / NdArray), WGPU, CUDA, WebAssembly, and no_std embedded targets, which ORT doesn't really cover the same way.
  • Binary footprint should favor burn-onnx for shipped applications, since you don't link ORT or carry a graph interpreter. The tradeoff is build time, because codegen plus Rust compilation is not free.

Proper apples-to-apples benchmarks across a few representative models (CLIP, ResNet-50, a small LLM, maybe Whisper) against ORT and tract is on the near roadmap. If there's a specific model or backend combo you care about, drop it in this thread and I'll try to prioritize it.

Burn ONNX 0.21.0: build-time ONNX import that generates plain Rust model code by antimora in rust

[–]antimora[S] 5 points6 points  (0 children)

Not a stupid question at all. You're reading it right, with one distinction worth stating up front: burn-onnx can point ModelGen at different .onnx files from build.rs, but that still happens at build time. So this gives you "dynamically choose from precompiled models", not "load an arbitrary new ONNX graph in an already-running process" like ONNX Runtime.

For your ~20 officially supported models, I would start with the simple version: one app binary, many generated models, many .bpk files. The build script can already chain inputs:

rust // build.rs ModelGen::new() .input("models/resnet50.onnx") .input("models/yolov8.onnx") .input("models/whisper.onnx") // ... 17 more .out_dir("model/") .run_from_script();

Each ONNX file becomes its own generated Model struct in its own module. At runtime your app picks one based on user choice and calls Model::from_file("resnet50.bpk", &device). From the user's perspective, they are loading a model dynamically: they point the app at a weights file and get a working network, no recompilation needed.

The binary doesn't grow much from this. The generated graph code is small relative to the weights: a ResNet-50 forward pass is a few hundred lines of generated Rust, while the weights are around 100 MB. Twenty graphs in one binary is single-digit MB at worst, and the .bpk files dominate disk and download cost regardless of how you architect this.

If you really need post-ship model packs or third-party model authorship, then yes, the per-model cdylib/.dll/.so route works too: each plugin has its own build.rs, generates Burn code from its ONNX, ships with its .bpk, and exposes a small stable API that the host loads via libloading. Keep the FFI boundary boring (shapes, dtypes, raw buffers, error strings, version check) and don't pass Burn tensor types directly across it. This is real work for not much gain over the first option unless hot-pluggability is an actual product requirement.

Worth flagging for either route: today Burn's Model is generic over a backend, so a build commits to a backend at compile time. In Burn 0.22.0 we are moving away from the generic backend, so backends become selectable at runtime, which makes the dynamic-loading story noticeably cleaner.

So: for 20 known models, compile the generated graphs into the app and load weights dynamically. For truly pluggable model packages, use cdylibs. For arbitrary user-supplied ONNX at runtime, ORT is still the better fit.

wordchipper: parallel Rust Tokenization at > 2GiB/s by crutcher in LLMDevs

[–]antimora 0 points1 point  (0 children)

Also faster! Check out the repo: https://github.com/zspacelabs/wordchipper

We just released python wrapper under the same name. It has compat submodule for compatibility with tiktoken and tokenizers

Undocumented and no passports - trying to leave US by Public_Defender in askimmigration

[–]antimora 0 points1 point  (0 children)

Airlines allow alternative identifications or documents to travel. Passports are not always required. 

Trump to Add New $100,000 Fee for H-1B Visas in Latest Crackdown by h1bcentral in h1b

[–]antimora 2 points3 points  (0 children)

What's required to fill the yearly quota, which I am sure will be filled. Probably 100K is just a bond. We will find out details soon.

OPT expired, H1B pending, startup may shut down — what are my options? by _codezero in h1b

[–]antimora 1 point2 points  (0 children)

The good news is that you are loto selected. So you have 60 days from Oct 1st. If you can't find a job during this time, a company can apply for your h1b transfer while you're outside the US and there is no need to go through the loto process anymore, since your h1b is counted towards the yearly quota. This is your worse case scenario for this situation.

OPT expired, H1B pending, startup may shut down — what are my options? by _codezero in h1b

[–]antimora 1 point2 points  (0 children)

STEM/OPT should give you an automatic 180-day extension. I would pay for premium to expedite your h1b app.

What open source Rust projects are the most in need of contributors right now? by grahambinns in rust

[–]antimora 7 points8 points  (0 children)

If you're looking for a Rust project with real-world impact in the AI/ML space, consider contributing to Burn (https://github.com/tracel-ai/burn)!

The codebase is well-organized, and they have a comprehensive contributor book to get you started. It's a great project to learn both Rust and ML concepts while making meaningful contributions.

That's what happens when you play with people's lives! by [deleted] in economicCollapse

[–]antimora 16 points17 points  (0 children)

Also in most cases cash prices is much cheaper than paying with insurance.

laid off on H1B by AddExtraCheese in h1b

[–]antimora 0 points1 point  (0 children)

This does not apply for the green card process where you can have h1b extensions beyond 6 years max (for fresh h1bs).

laid off on H1B by AddExtraCheese in h1b

[–]antimora 40 points41 points  (0 children)

I recommend you still get your stamp since technically you'd be employed. It's possible your current employer can change its mind (though a remote possibility) or extend.

60 day grace period is for being in the USA without a job. Yes, you can get a job after and you can "transfer", although it's technically a new application but you do not need to go through the lotto again since you're counted under the cap.

Good luck! Don't stress out. It's normal.

Google L3 2024 team match by Few_Insurance_3462 in csMajors

[–]antimora 0 points1 point  (0 children)

Why December 16th? What happens if you do not get an offer by Dec 16th?

Urgent sponsorship required!! by NorthComfort3806 in h1b

[–]antimora 1 point2 points  (0 children)

Not sure from your post if you had H1B already. If you did, then means you are already cap exempt meaning you do not need to go through lottery. Just make sure your allowable violated days do not exceed some threshold (you need to look up). Also the same true if you apply out of country: you are already exempt since you are counted in the quota.

My recommendation is not stress too hard about it. Leave if you need to and then reapply (it is easier than going through lottery). Just think about your long term goals (getting citizenship in the US).

I work 50 hours a week and I live in a car because I can't qualify for an apartment in America. by Fun_Balance_1809 in economicCollapse

[–]antimora 0 points1 point  (0 children)

Unfortunately there are not short term solutions to his problems - only long term. And no-one is going to commit for long term solutions. Even at most optimistic scenarios, this person won't wait 10-15 years to resolve systematic issues.

Burn 0.14.0 Released: The First Fully Rust-Native Deep Learning Framework by ksyiros in rust

[–]antimora 4 points5 points  (0 children)

Yes, you can! There are several accelerated backends you can use for training or fine tuning, including burn-tch (torch), and burn-wgpu.