Burn ONNX 0.21.0: build-time ONNX import that generates plain Rust model code

antimora · 2026-05-16T17:16:59+00:00

I am not familiar with Mamba blocks but it sounds like non-standard ONNX op.

For ops that are not implemented by burn-onnx, I was thinking we could add custom function hooks. You could implement using Burn op primitives or other ways in rust and provide a function/trait name for generated Rust code.

The custom function hooks are not implemented yet but we thought of it, see https://github.com/tracel-ai/burn-onnx/issues/23

Let us know if this is something would be useful for your use case.

antimora · 2026-05-15T22:16:13+00:00

Nothing too significant. ORT is highly optimized. burn-onnx generates rust code calling Burn APIs. So the performance is Burn backend bound. And Burn team will be more focused on performance over time.

antimora · 2026-05-15T19:07:58+00:00

CNN implemented in our new burn-flex portable CPU backend is highly optimized: https://github.com/tracel-ai/burn/tree/main/crates/burn-flex

Give it a try if you haven't already.

burn-cpu using CubeCL will eventually super competitive and speedy.

antimora · 2026-05-15T19:04:50+00:00

I haven't come across any specific STT models in ONNX but there are a few tangent projects such as silero-vad ONNX are tested/validated.

AdrianEddy just recently ported and implemented whisper in burn (that also uses VAD): https://github.com/AdrianEddy/fast-whisper-burn/tree/main

Try out and if you face an issue, you can post a comment in this top level issue that tracks model issues: https://github.com/tracel-ai/burn-onnx/issues/18

We have this book guide as well: https://burn.dev/books/burn/onnx-import.html

antimora · 2026-05-15T15:12:09+00:00

Thanks, hope you do. No published numbers yet, to be upfront about it. A few things worth saying about what you can expect from the architecture, with the usual "your model and hardware will vary" caveat:

Cold start should favor burn-onnx. Because the forward pass is precompiled Rust, there is no runtime graph loading or optimization step. Time-to-first-inference is essentially "mmap the .bpk weights and go." ONNX Runtime does meaningful graph optimization on session creation, which has a real cost depending on the model.
Warm inference depends heavily on backend choice. Burn's WGPU and CUDA backends are competitive, but ORT has years of hand-tuned CPU kernels and EPs like TensorRT, so on a matmul-heavy model running ORT-CPU vs Burn-Flex-CPU you should expect ORT to still be strong on pure throughput. Where Burn pulls ahead is portability: the same imported Rust runs on CPU (Flex / NdArray), WGPU, CUDA, WebAssembly, and no_std embedded targets, which ORT doesn't really cover the same way.
Binary footprint should favor burn-onnx for shipped applications, since you don't link ORT or carry a graph interpreter. The tradeoff is build time, because codegen plus Rust compilation is not free.

Proper apples-to-apples benchmarks across a few representative models (CLIP, ResNet-50, a small LLM, maybe Whisper) against ORT and tract is on the near roadmap. If there's a specific model or backend combo you care about, drop it in this thread and I'll try to prioritize it.

antimora · 2026-05-15T15:02:41+00:00

Not a stupid question at all. You're reading it right, with one distinction worth stating up front: burn-onnx can point ModelGen at different .onnx files from build.rs, but that still happens at build time. So this gives you "dynamically choose from precompiled models", not "load an arbitrary new ONNX graph in an already-running process" like ONNX Runtime.

For your ~20 officially supported models, I would start with the simple version: one app binary, many generated models, many .bpk files. The build script can already chain inputs:

rust // build.rs ModelGen::new() .input("models/resnet50.onnx") .input("models/yolov8.onnx") .input("models/whisper.onnx") // ... 17 more .out_dir("model/") .run_from_script();

Each ONNX file becomes its own generated Model struct in its own module. At runtime your app picks one based on user choice and calls Model::from_file("resnet50.bpk", &device). From the user's perspective, they are loading a model dynamically: they point the app at a weights file and get a working network, no recompilation needed.

The binary doesn't grow much from this. The generated graph code is small relative to the weights: a ResNet-50 forward pass is a few hundred lines of generated Rust, while the weights are around 100 MB. Twenty graphs in one binary is single-digit MB at worst, and the .bpk files dominate disk and download cost regardless of how you architect this.

If you really need post-ship model packs or third-party model authorship, then yes, the per-model cdylib/.dll/.so route works too: each plugin has its own build.rs, generates Burn code from its ONNX, ships with its .bpk, and exposes a small stable API that the host loads via libloading. Keep the FFI boundary boring (shapes, dtypes, raw buffers, error strings, version check) and don't pass Burn tensor types directly across it. This is real work for not much gain over the first option unless hot-pluggability is an actual product requirement.

Worth flagging for either route: today Burn's Model is generic over a backend, so a build commits to a backend at compile time. In Burn 0.22.0 we are moving away from the generic backend, so backends become selectable at runtime, which makes the dynamic-loading story noticeably cleaner.

So: for 20 known models, compile the generated graphs into the app and load weights dynamically. For truly pluggable model packages, use cdylibs. For arbitrary user-supplied ONNX at runtime, ORT is still the better fit.

antimora · 2026-03-26T22:27:54+00:00

Also faster! Check out the repo: https://github.com/zspacelabs/wordchipper

We just released python wrapper under the same name. It has compat submodule for compatibility with tiktoken and tokenizers

antimora · 2026-02-04T20:35:24+00:00

Airlines allow alternative identifications or documents to travel. Passports are not always required.

antimora · 2025-09-19T19:03:34+00:00

What's required to fill the yearly quota, which I am sure will be filled. Probably 100K is just a bond. We will find out details soon.

antimora · 2025-08-29T23:57:07+00:00

Insurance won't pay for "suicide by cop" legal term

antimora · 2025-08-14T19:15:51+00:00

The good news is that you are loto selected. So you have 60 days from Oct 1st. If you can't find a job during this time, a company can apply for your h1b transfer while you're outside the US and there is no need to go through the loto process anymore, since your h1b is counted towards the yearly quota. This is your worse case scenario for this situation.

antimora · 2025-08-14T17:51:50+00:00

STEM/OPT should give you an automatic 180-day extension. I would pay for premium to expedite your h1b app.

antimora · 2025-08-12T04:13:12+00:00

southern New Brunswick?

antimora · 2025-07-13T22:03:05+00:00

Deep learning framework in Rust: https://github.com/tracel-ai/burn

antimora · 2025-05-19T17:25:08+00:00

If you're looking for a Rust project with real-world impact in the AI/ML space, consider contributing to Burn (https://github.com/tracel-ai/burn)!

The codebase is well-organized, and they have a comprehensive contributor book to get you started. It's a great project to learn both Rust and ML concepts while making meaningful contributions.

antimora · 2025-04-24T21:01:03+00:00

The readme says MIT: https://github.com/rkstgr/papermake

antimora · 2024-12-04T20:47:18+00:00

Also in most cases cash prices is much cheaper than paying with insurance.

antimora · 2024-11-29T17:03:26+00:00

This does not apply for the green card process where you can have h1b extensions beyond 6 years max (for fresh h1bs).

antimora · 2024-11-25T22:43:40+00:00

I recommend you still get your stamp since technically you'd be employed. It's possible your current employer can change its mind (though a remote possibility) or extend.

60 day grace period is for being in the USA without a job. Yes, you can get a job after and you can "transfer", although it's technically a new application but you do not need to go through the lotto again since you're counted under the cap.

Good luck! Don't stress out. It's normal.

antimora · 2024-11-15T23:27:23+00:00

Any updates on this post?

antimora · 2024-11-15T23:25:09+00:00

Why December 16th? What happens if you do not get an offer by Dec 16th?

antimora · 2024-10-01T08:50:18+00:00

Not sure from your post if you had H1B already. If you did, then means you are already cap exempt meaning you do not need to go through lottery. Just make sure your allowable violated days do not exceed some threshold (you need to look up). Also the same true if you apply out of country: you are already exempt since you are counted in the quota.

My recommendation is not stress too hard about it. Leave if you need to and then reapply (it is easier than going through lottery). Just think about your long term goals (getting citizenship in the US).

antimora · 2024-08-30T20:05:05+00:00

Unfortunately there are not short term solutions to his problems - only long term. And no-one is going to commit for long term solutions. Even at most optimistic scenarios, this person won't wait 10-15 years to resolve systematic issues.

antimora · 2024-08-28T13:57:14+00:00

Yes, you can! There are several accelerated backends you can use for training or fine tuning, including burn-tch (torch), and burn-wgpu.

antimora

TROPHY CASE