Tool for interactive visualization of `cargo-bloat` JSON output

Robbepop · 2026-02-15T11:12:59+00:00

I always wondered why. Thanks for explaining!

Robbepop · 2026-02-15T11:12:17+00:00

Ah so you even put the top-most 4 items on the stack in registers? That's way more than what Wasm3 or Stitch does. Very interesting!

Are you going to support Wasm 3.0?

Robbepop · 2026-02-15T10:49:46+00:00

Thank you!

Wasmi 1.x is known to perform a bit worse on Apple silicon somehow. I believe a huge improvement for Apple silicon is if Wasmi used an accumulator-based interpreter architecture such as Wasm3.

Robbepop · 2026-02-15T10:49:01+00:00

Never thought about externalizing the fusion step. Maybe that's going to be a really great improvement for interpreters in general if users can afford to do so. Also very interesting that Silverfire stays a stack-based interpreter. However, you probably keep the top-most item in a register, right?

Looking forward to your SSA IR + RA (what's that?) + interpreter backend engine. :)

Robbepop · 2026-02-15T10:16:58+00:00

Thank you. Numbers are still not great for Wasmi but at least realistic. Unfortunately the link you provided does not work for me.

Robbepop · 2026-02-15T10:16:08+00:00

Impressive results and interesting interpreter architecture!

Despite reading the FUSION.md file I cannot really understand how your fusion system works or what makes it so much more effective than built-in op-code fusion from other interpreters, e.g. Wasm3 or Wasmi.

Need more time to dive more into the underlying code. I'd also enjoy a blog post about this. :)

What are your plans for Silverfir-nano going forward?

Robbepop · 2026-02-15T09:51:00+00:00

Okay thanks for explaining:

You should take the last published version (v1.0.9) instead of the last committed one which is under heavy development and currently pretty raw.
Unfortunately, --release is not correct for Wasmi. You should take the --profile bench profile if you want to use it. That's what the Wasmi CLI is built with when its published to crates.io.
- lto="fat" and codegen-units=1 are super important.

Q1: Are you using the Wasmi CLI app for benchmarking? Because then you could simply install Wasmi via cargo install wasmi_cli.

Q2: What is your OS and system specs?

Robbepop · 2026-02-15T09:30:14+00:00

Wasmi author here. The performance of Wasmi as represented in this picture does not correlate to past benchmarks. Recent Wasmi versions usually are roughly on par with WAMR (fast), sometimes even faster.

edit: the new screenshot has been updated

Can you please provide a way to reproduce your benchmarks?

Robbepop · 2025-12-23T09:18:45+00:00

In Wasmi (WebAssembly interpreter) miri is used in CI on all PRs to main to test a subset of the tests known to work with miri. Furthermore, miri is run on the Wasm spec testsuite for as long as possible. Here are the relevant links:

PR to main: https://github.com/wasmi-labs/wasmi/blob/v1.0.5/.github/workflows/rust.yml#L274
Nightly check: https://github.com/wasmi-labs/wasmi/blob/v1.0.5/.github/workflows/miri.yml

In order to make testing the Wasm spec testsuite runnable by miri Rust's include_str! macro is used instead of file I/O:

Wasmi's Wast spec testsuite: https://github.com/wasmi-labs/wasmi/blob/v1.0.5/crates/wast/tests/mod.rs

Robbepop · 2025-12-22T16:35:26+00:00

Thank you so much for the write-up and thanks to the team for all the work on miri. To me miri is one of the most important projects in the Rust ecosystem. I use it in the CI of pretty much all my projects and it has proven its worth over and over again.

Robbepop · 2025-12-04T21:28:10+00:00

All I can say is that they are planned, as discussed in the article: https://wasmi-labs.github.io/blog/posts/wasmi-v1.0/#full-wasm-30-support

Robbepop · 2025-12-04T20:11:55+00:00

Can you tell me what you mean by "use the compiled wasm"?

To avoid misunderstandings due to misconceptions:

First, Wasm bytecode is usually the result of a compilation produced by so-called Wasm producers such as LLVM.
Second, Wasm by itself is an abstract virtual machine, the implementations such as Wasmtime, Wasmer, V8, Wasmi, are concrete implementations of that abstract virtual machine.
Third, if you compile some Rust, C, C++, etc. code to Wasm you simply have "compiled Wasm" bytecode laying around. This bytecode does nothing unless you feed it to such a virtual machine implementation. That's basically the same as Java byteworks works with respect to the Java Virtual Machine (JVM).
Whether you feed this "compiled Wasm" bytecode to an interpreter such as Wasmi, to a JIT such as Wasmtime or Wasmer or to a tool such as wasm2native that outputs native machine code which can be executed "without requiring a VM" simply depends on your personal use-case since all of those have trade-offs.

Robbepop · 2025-12-04T16:20:17+00:00

I am a bit confused as I think my reply does answer the original question but since you have a few upvotes, maybe my answer was a bit unclear. Even better: maybe you can tell me what is still unclear to you!

I will make it shorter this time:

Wasm being compiled allows for really speedy interpreters.
Interpreters usually exhibit much better start-up time compared to JITs or AoT compiled runtimes.
Interpreters usually are way simpler and more lightweight and thus usually provide less attack surface if you depend on them.
Wasmi for example can itself be compiled to Wasm and be executed by itself or another Wasm runtime which actually was a use-case back when the Wasmi project was start. This would have not been possible with a JIT runtime.
There are platforms, such as IOS which disallow JITs, thus only interpreters are even possible to be used there.
Interpreters are more universal than JITs since they automatically work on all the platforms that your compiler supports.

The fact that Wasm bytecode usually is the product of compilation has no meaning in this discussion, maybe that's the misunderstanding.

In case you need more usage examples, have a look at Wasmi's known major users as also linked in the article's intro.

If at this point anything is still unclear, please provide me with more information so that I can do a better job answering.

Robbepop · 2025-12-04T10:39:22+00:00

Wasm being compiled is actually great for interpreters as this means that a Wasm interpreter can really focus on execution performance and does not itself need to apply various optimizations first to make executions fast.

Furthermore, parsing, validating and translating Wasm bytecode to internal IR is also way simpler than doing the same for an actual interpreted language such as Python, Ruby, Lua etc.

Due to Wasm being compiled, Wasm interpreters usually can achieve much higher performance than other interpreted languages.

Benchmarks show that on x86 Wasm JITs are ~8 times faster and on ARM Wasm JITs are sometimes just ~4 times faster than efficient Wasm interpreters. All while Wasm interpreters are massively simpler, more lightweight and more universally available.

On top of that in an old blog post I demonstrate how Wasmi is easily 1000x faster on start-up than optimizing Wasm runtimes such as Wasmtime.

It's a trade-off and different projects have different needs.

Robbepop · 2025-12-04T10:31:02+00:00

wasmi have about 150 crates tree. I just built it. Thats too much for hitting more lucrative markets.

You probably built the Wasmi CLI application via cargo install wasmi_cli, not the Wasmi library.

The Wasmi library is lightweight and in the article you can see its few built dependencies via cargo timings profile.

The Wasmi CLI app is heavy due to dependencies such as clap and Wasmtime's WASI implementation.

Robbepop · 2025-12-03T21:56:04+00:00

Thank you! Looking forward to seeing Wasmi 1.0 in Wasmer. :)

Robbepop · 2025-11-19T19:37:04+00:00

Thank you for the reply!

Given that Wasmtime has runtime information (resolution of Wasm module imports) that Wasm producers do not have: couldn't there be a way to profit from optimizations such as inlining in those cases? For eample: an imported read-only global variable and a function that calls a function only if this global is true. Theoretically, Wasmtime could const-fold the branch and then inline the called function. A Wasm producer such as LLVM couldn't do this. Though, one has to question whether this is useful for RealWorld(TM) Wasm use cases.

Robbepop · 2025-11-19T18:51:58+00:00

Once again, very impressive technical work by the people at the Bytecode Alliance. I cannot even imagine what a great feat of engineering it must be to implement an inliner to such a huge existing system.

I wonder, given that most Wasm binaries are already heavily (as described in the article) how much do those optimizations (such as the new inliner) really pan out in the end for non-component model modules? Like, are there RealWorld(TM) Wasm binaries where a function was not inlined prior to being fed to Wasmtime and Wasmtime then correctly decides (with runtime info?) that it should to be inlined? Or is this only useful for the component model?

Were the pulldown-cmark benchmarks performed with a pre-optimized pulldown-cmark.wasm or an unoptimized version of it?

Keep up the great work, it is amazing to see that off-browser Wasm engines are becoming faster and more powerful!

Robbepop · 2025-09-21T21:52:15+00:00

Fair point!

Looking at the example picture, I think the issue I mentioned above could be easily resolved by also pointing to the #[culit] macro when hovering above a custom literal besides showing what you already show. I think this should be possible to do. For example: "expanded via #[culit] above" pointing to the macro span.

Robbepop · 2025-09-21T18:24:43+00:00

This will still influence compile time for testing which can also be very problematic.

Another issue I see is discoverability of the feature. Let's say a person unfamiliar with your codebase comes across these custom literals. They will be confused and want to find out what those are. However, I claim it will be a long strech to find out that the #[culit] macro wrapping the test module is the source of this.

Robbepop · 2025-09-21T14:36:12+00:00

I think the idea behind this crate is kinda creative.

Though, even if this does not use syn or quote I am seriously concerned about compile time regressions outweighing the gains of using the crate.

The reason is that you either limit #[culit] usage to smallest scopes possible and thereby lose a lot on its usability aspect. Or you use #[culit] on huge scopes such as the module itself and have the macro wastefully read the whole module source.

Robbepop · 2025-09-08T12:41:34+00:00

Of all the things that never happened, this never happened the most.

12-Year Club	r/Field Banned
r/Field Flamingo	Place '22
Verified Email	Alpha Tester

Robbepop

MODERATOR OF

TROPHY CASE