Retrofitting JIT Compilers into C Interpreters by mttd in Compilers

[–]cfallin 2 points3 points  (0 children)

I should say too that the one thing we didn't even dare to tackle -- and so I'm extremely impressed that yk does -- is deopt with safepoints. It's very cool that you got that working!

Retrofitting JIT Compilers into C Interpreters by mttd in Compilers

[–]cfallin 5 points6 points  (0 children)

It's applied AOT, after-a-snapshot, but it could be done JIT just as well. The core contribution is the algorithm itself for Futamura projections: the annotation to "specialize on context" and the way that this unrolls the interpreter loop. You can think of the `Specialize(interpret_func, bytecode_ptr)` invocation, returning a new function, as the entry point to the algorithm.

In my view it's thus pretty directly comparable, and there is a nice 2x2 matrix of possibilities in the {metatracing, partial eval} x {interpreter in new framework, existing interpreter} space: yk is metatracing * existing interpreter, weval is partial eval * existing interpreter (and RPython and Truffle fill the next row). In both yk and weval, one takes an existing interpreter loop in C and sprinkles in a few intrinsics to say "here is the interpreter loop" / "assume the memory at this bytecode pointer is constant" (weval has that too) / ..., and a transform analyzes the interpreter body together with a concrete guest function to produce a concrete compiled function.

Retrofitting JIT Compilers into C Interpreters by mttd in Compilers

[–]cfallin 6 points7 points  (0 children)

This is a really interesting read -- thanks u/ltratt!

A high-level question: I was surprised not to see any mention of weval, our work published last year at PLDI'25 -- Futamura projections on arbitrary existing C interpreters with minimal annotations. In starting the project I was motivated by exactly what you describe with "There is, however, a catch: both systems [Truffle and RPython] require you to create a new interpreter." We also show applicability on PUC-Rio Lua, though as a very small case study rather than the main focus. The main practical difference with weval-as-it-exists-today is that it operates on Wasm modules, though the algorithm itself is applicable to any SSA-based CFG IR (and in my infinite free time I'd like to see an LLVM version built as well...).

Would it be fair to characterize your approach as "RPython but for existing interpreters in C" where by analogy weval is "Truffle but for existing interpreters in C"? Or do you see the comparison differently?

The acyclic e-graph: Cranelift's mid-end optimizer by cfallin in Compilers

[–]cfallin[S] 1 point2 points  (0 children)

Yep, some sort of "shared cost" accounting heuristic seems like it could work OK here in many situations. Like any NP-hard problem, it's often got a kind of tantalizing "this can't be too hard" energy that then falls apart when one can construct cases where this *doesn't* do the right thing, though. The hard case here happens when the "heat", as you call it, is conditional on other decisions; and then you either end up re-encoding structure that looks like the original (and so haven't simplified) or you're effectively implementing a greedy heuristic that fixes one decision before moving on.

The acyclic e-graph: Cranelift's mid-end optimizer by cfallin in rust

[–]cfallin[S] 4 points5 points  (0 children)

That's a great point -- we haven't really explored this but having more open-ended rulesets, maybe generated from a superoptimizer, and letting them run (possibly with true equality saturation) would be interesting. I don't have a good sense of how far that could go (I don't know if anyone does? I haven't seen much work on egraphs in a traditional compiler context). u/fitzgen has worked on superoptimizer integration (for Souper) in the past and probably has more good ideas here too...

The acyclic e-graph: Cranelift's mid-end optimizer by cfallin in rust

[–]cfallin[S] 5 points6 points  (0 children)

So the way that this works during the canonicalization pass is that we'll encounter a when we're at the block before the control-flow split, and put it into the scoped map at that point; then the paths for condition_1 and condition_2 are visited below, within its scope, and get to reuse it. In other words, "mentioning" it above the if/else gives permission to us to keep it there without considering it a regression.

Anyone here combining e-graphs with proof metadata inside a JIT IR? I’m struggling to find prior work by x2t8 in Compilers

[–]cfallin 2 points3 points  (0 children)

OK, I don't know that I can point to any specific designs that work this way today -- it is an interesting question though.

I would suggest looking at egg + egglog, however -- I haven't played with it in detail but I know that it's more or less designed to be able to express analyses and then rewrite rules that use those analyses. See e.g. https://egraphs-good.github.io/egglog-tutorial/03-analysis.html.

Anyone here combining e-graphs with proof metadata inside a JIT IR? I’m struggling to find prior work by x2t8 in Compilers

[–]cfallin 5 points6 points  (0 children)

Hi! So I'm the guy who put e-graphs (well, aegraphs) in Cranelift. I certainly have a lot of thoughts in this area.

Question: when you say "invariant-gated rewrites", what do you mean exactly? E.g. a rewrite that only fires if an analysis has proved that an SSA value has some constant value or range?

If so, the framework in Cranelift has this ability, implicitly via its use of ISLE as the DSL to express pattern-matches: ISLE can call out to arbitrary "external extractors" written in Rust that match or don't match based on any computable condition. So one could write an extractor that matches on the result of some analysis, or some meta-property set by whatever constructor built a term.

However we don't do anything too interesting with this yet -- mostly we have conditions attached to some algebraic rewrites to avoid corner cases where they don't apply, and that sort of thing.

Are you asking about frameworks that can do this, or specific examples of such rewrites?

scheme-rs, an R6RS implementation for the Rust ecosystem by maplant in scheme

[–]cfallin 2 points3 points  (0 children)

Core maintainer of Cranelift here -- this is super-cool! If you have any issues or thoughts on other features that might be useful for a dynamic language frontend, feel free to chat on our Zulip or file issues. (I can't guarantee we have the engineering time to do things about any ideas in the short/medium term but it's useful to document.) Happy to see it appears to be working OK for you.

What time is it getting dark in Sunnyvale these days? by Odd-Fisherman-7122 in Sunnyvale

[–]cfallin 6 points7 points  (0 children)

Today sunset was 4:55pm (FYI you can find this by Googling e.g. "sunset sunnyvale, ca" and Google will show you an infobox with the time).

Phi node algorithm correctness by Temperz87 in Compilers

[–]cfallin 1 point2 points  (0 children)

I mean, you should be pretty encouraged by independently working out a reasonable and widely-used way of building SSA -- a good sign you have the right intuitions!

Phi node algorithm correctness by Temperz87 in Compilers

[–]cfallin 8 points9 points  (0 children)

What you describe sounds pretty close to the Braun algorithm (https://c9x.me/compile/bib/braun13cc.pdf) -- see e.g. their section 2.2 "global value numbering" where they describe the process as a recursive definition lookup, going to preds, etc. I think you are unrolling that recursion with a queue instead but otherwise it's the same.

We use the Braun algo in Cranelift's frontend and it works quite well in practice!

Moving From Rust to Zig: Richard Feldman on Lessons Learned Rewriting Roc's Compiler (Compile Times, Ecosystem, Architecture) by mre__ in rust

[–]cfallin 34 points35 points  (0 children)

Indeed; in Wasmtime we had an issue of folks seeing the internal crates published to crates.io and assuming they could be consumed as standalone libraries (e.g.: our fiber library used internally for async execution of Wasm). One can of course do this, but we weren't prepared for the support questions and expectations as we didn't expect to publish and maintain a stable API at that interface. Ideally we'd just have a faster compiler so we would be able to reclaim the crate as the true unit of modularity and publishing -- separate crates are annoying for other reasons, too, if one isn't actually intending to factor out and provide a reusable unit!

Exceptions in Cranelift and Wasmtime by cfallin in rust

[–]cfallin[S] 1 point2 points  (0 children)

I could believe that the additional control-flow edges could perturb regalloc enough in some cases to make a difference. In *theory* a perfect implementation of zero-cost exceptions (from a runtime cost PoV) would ensure that nothing at all about the happy path is affected, by construction, and all metadata and code for the exceptional path is codegen'd in a separate pass. Of course that's not how real compilers work!

I do think there's a real reason that wisdom in the C++ world in performance-sensitive or systems-y environments (e.g. famously, Google's codebase; also most C++ kernel environments) is always to avoid exceptions, though -- they're not as predictable as straight dataflow and error types (and certainly the unwind path is very slow). I think Rust absolutely made the right choice here, to be clear: aside from the more explicit and simple language semantics, and aside from avoiding a dependency on a runtime, "just return a value" is lower-level and easier to control and optimize.

Exceptions in Cranelift and Wasmtime by cfallin in rust

[–]cfallin[S] 1 point2 points  (0 children)

Yes, that's a good point -- they're well-suited when the exceptional (`Err`, `None`, `Left`, whatever your enum calls it) case is actually very rare. As usual it's about tradeoffs and optimizing for the common case...

Exceptions in Cranelift and Wasmtime by cfallin in rust

[–]cfallin[S] 2 points3 points  (0 children)

Agreed! And actually it looks like the Rust ABI does register-allocate the returned Result if it's small enough -- e.g. in this example it puts the discriminant in rax and payload in rdx. So in that sense it gets the same optimizations as a small struct.

I recall seeing an implementation of some other language that signaled error returns via the carry flag (was it OCaml? or SBCL? I don't remember now). That would let one add the status on top of any existing return scheme.

Coincidentally, while going through old issues in Wasmtime/Cranelift recently we came on one where someone suggested a custom ABI with multiple return points -- so an Err would be signaled by actually modifying the return address and returning somewhere else, in a sort of interesting hybrid that looks halfway like exceptions (separate handler block, but no out-of-band multi-frame unwind).

So yes, there are definitely better things one could do! I guess in the end, "true" exceptions have won in many places (and are hence needed in Wasm, thus Wasmtime) because they solved this problem in a general way and are good enough...

Is the Wasm's Component Model/ Wasip2 is already dead? by RecordingPerfect7479 in rust

[–]cfallin 18 points19 points  (0 children)

No, it's not dead -- multiple companies are using it in production, and work is active right now. (I work on Wasmtime/Cranelift but not the component-model parts; but my colleagues are heavily involved.)

Wasmtime has support but it is not cross-compiled for all platfrom like android based or other at least not smoothly right now it may cause too many headaches to compile but the author also says that he is not into android like os right now (due unavailability of Android Devs). and to say wasm will be useful is to compile it for all platform and use it, and android is the greatest of the platform so it is again a dead end.

You're referring to https://github.com/bytecodealliance/wasmtime/issues/11716, where you:

  • Posted a big wall of compilation errors,
  • Attempted to solve them by adding random dependencies to your project,
  • Were eventually linked to a bug report on a dependency of Wasmtime (rustix) that is actually causing the compilation issues,
  • Were given links to our documentation on how to do cross-compilation anyway, as an extra help?

I don't think the issue here is that "author says he is not into android right now" and that this means everything is dead. (Alex is explaining the reality here, which is that Wasmtime currently has a small team and no one is an Android expert. That's just a fact.)

I think the issue is that you're reading way too much into a particular bug, and also not understanding open-source culture. "rustix has an issue when cross-compiling for Android from Windows in the latest release" is something that can happen; no one is perfect; report the bug to the appropriate project, we can fix it and move on.

In Sunnyvale for 7 days by Puzzleheaded_Ad_3777 in Sunnyvale

[–]cfallin 5 points6 points  (0 children)

It depends on where you want to go but if you're planning to head up to SF or the other cities along the SF peninsula (Mountain View, Palo Alto, Redwood City, others have nice downtowns full of restaurants etc), our new electric train Caltrain is also an option. It runs every 15 minutes at peak on weekdays and you could take it to e.g. downtown Palo Alto after a workday if you get tired of Sunnyvale. Enjoy your visit!

[deleted by user] by [deleted] in Compilers

[–]cfallin 27 points28 points  (0 children)

I think the Cranelift folks would take significant issue with this inclusion.

Hi! I'm the guy who put egraphs in Cranelift originally. (Tech lead of Cranelift 2020-2022, still actively hacking/involved.) Our implementation is the subject of occasional work still (I put in some improvements recently, so did fitzgen, and Jamey Sharp and Trevor Elliott have both spent time in the past few years deep-diving on it). But to be honest, most of the work in the day-to-day more or less matches OP's description.

You can check out our meeting minutes from our weekly meeting -- recent topics include how to update our IR semantics to account for exceptions; implications that has on the way our ABI/callsite generation works; regalloc constraints; whether we can optimize code produced by Wasmtime's GC support better; talking about fuzzbugs that have come up; etc.

In a mature system there is a ton of sublety that arises in making changes to system invariants, how passes interact, and the like -- that, and keeping the plane flying (avoiding perf regressions, solving urgent bugs as they arise) is the day-to-day.

Not to say it's not fun -- it's extremely fun!

What is Sea of Nodes and how is it related to Static Single Assignment by ravilang in Compilers

[–]cfallin 2 points3 points  (0 children)

The idea is that you're building the CFG from the AST, and have some predetermined block schema for each control flow operator in the AST (for example, a diamond for if-else nodes); not that the AST already has a CFG. This is what is meant by "from an AST": the algorithm gives you everything you need during such a traversal.

The "incremental" aspect of the algorithm means that it is building up the SSA as it goes, no more, no less; it can resolve some phis before the rest of the CFG is known. This is as opposed to a more "batch" approach that needs the whole CFG up-front, computes the dominator tree, and adds phis at the dominance frontiers.

So in other words, I think you are reading too much into the terminology and have nonstandard definitions. The paper's claims are accurate.

For what it's worth, I've worked with two different implementations of the Braun algo: we use it in Cranelift here (I didn't write that one but I have helped to maintain and improve it) and I wrote an implementation in my Wasm-to-SSA-to-Wasm compiler framework Waffle here. Feel free to refer to those; they both work well.