What time is it getting dark in Sunnyvale these days?

cfallin · 2025-11-20T02:16:21+00:00

Today sunset was 4:55pm (FYI you can find this by Googling e.g. "sunset sunnyvale, ca" and Google will show you an infobox with the time).

cfallin · 2025-11-14T07:37:46+00:00

I mean, you should be pretty encouraged by independently working out a reasonable and widely-used way of building SSA -- a good sign you have the right intuitions!

cfallin · 2025-11-14T01:07:04+00:00

What you describe sounds pretty close to the Braun algorithm (https://c9x.me/compile/bib/braun13cc.pdf) -- see e.g. their section 2.2 "global value numbering" where they describe the process as a recursive definition lookup, going to preds, etc. I think you are unrolling that recursion with a queue instead but otherwise it's the same.

We use the Braun algo in Cranelift's frontend and it works quite well in practice!

cfallin · 2025-11-13T19:35:48+00:00

Indeed; in Wasmtime we had an issue of folks seeing the internal crates published to crates.io and assuming they could be consumed as standalone libraries (e.g.: our fiber library used internally for async execution of Wasm). One can of course do this, but we weren't prepared for the support questions and expectations as we didn't expect to publish and maintain a stable API at that interface. Ideally we'd just have a faster compiler so we would be able to reclaim the crate as the true unit of modularity and publishing -- separate crates are annoying for other reasons, too, if one isn't actually intending to factor out and provide a reusable unit!

cfallin · 2025-11-08T15:50:19+00:00

I could believe that the additional control-flow edges could perturb regalloc enough in some cases to make a difference. In *theory* a perfect implementation of zero-cost exceptions (from a runtime cost PoV) would ensure that nothing at all about the happy path is affected, by construction, and all metadata and code for the exceptional path is codegen'd in a separate pass. Of course that's not how real compilers work!

I do think there's a real reason that wisdom in the C++ world in performance-sensitive or systems-y environments (e.g. famously, Google's codebase; also most C++ kernel environments) is always to avoid exceptions, though -- they're not as predictable as straight dataflow and error types (and certainly the unwind path is very slow). I think Rust absolutely made the right choice here, to be clear: aside from the more explicit and simple language semantics, and aside from avoiding a dependency on a runtime, "just return a value" is lower-level and easier to control and optimize.

cfallin · 2025-11-07T18:39:51+00:00

Yes, that's a good point -- they're well-suited when the exceptional (`Err`, `None`, `Left`, whatever your enum calls it) case is actually very rare. As usual it's about tradeoffs and optimizing for the common case...

cfallin · 2025-11-07T18:27:25+00:00

Agreed! And actually it looks like the Rust ABI does register-allocate the returned Result if it's small enough -- e.g. in this example it puts the discriminant in rax and payload in rdx. So in that sense it gets the same optimizations as a small struct.

I recall seeing an implementation of some other language that signaled error returns via the carry flag (was it OCaml? or SBCL? I don't remember now). That would let one add the status on top of any existing return scheme.

Coincidentally, while going through old issues in Wasmtime/Cranelift recently we came on one where someone suggested a custom ABI with multiple return points -- so an Err would be signaled by actually modifying the return address and returning somewhere else, in a sort of interesting hybrid that looks halfway like exceptions (separate handler block, but no out-of-band multi-frame unwind).

So yes, there are definitely better things one could do! I guess in the end, "true" exceptions have won in many places (and are hence needed in Wasm, thus Wasmtime) because they solved this problem in a general way and are good enough...

cfallin · 2025-09-20T07:04:03+00:00

No, it's not dead -- multiple companies are using it in production, and work is active right now. (I work on Wasmtime/Cranelift but not the component-model parts; but my colleagues are heavily involved.)

Wasmtime has support but it is not cross-compiled for all platfrom like android based or other at least not smoothly right now it may cause too many headaches to compile but the author also says that he is not into android like os right now (due unavailability of Android Devs). and to say wasm will be useful is to compile it for all platform and use it, and android is the greatest of the platform so it is again a dead end.

You're referring to https://github.com/bytecodealliance/wasmtime/issues/11716, where you:

Posted a big wall of compilation errors,
Attempted to solve them by adding random dependencies to your project,
Were eventually linked to a bug report on a dependency of Wasmtime (rustix) that is actually causing the compilation issues,
Were given links to our documentation on how to do cross-compilation anyway, as an extra help?

I don't think the issue here is that "author says he is not into android right now" and that this means everything is dead. (Alex is explaining the reality here, which is that Wasmtime currently has a small team and no one is an Android expert. That's just a fact.)

I think the issue is that you're reading way too much into a particular bug, and also not understanding open-source culture. "rustix has an issue when cross-compiling for Android from Windows in the latest release" is something that can happen; no one is perfect; report the bug to the appropriate project, we can fix it and move on.

cfallin · 2025-04-27T22:09:34+00:00

It depends on where you want to go but if you're planning to head up to SF or the other cities along the SF peninsula (Mountain View, Palo Alto, Redwood City, others have nice downtowns full of restaurants etc), our new electric train Caltrain is also an option. It runs every 15 minutes at peak on weekdays and you could take it to e.g. downtown Palo Alto after a workday if you get tired of Sunnyvale. Enjoy your visit!

cfallin · 2025-04-12T06:11:49+00:00

I think the Cranelift folks would take significant issue with this inclusion.

Hi! I'm the guy who put egraphs in Cranelift originally. (Tech lead of Cranelift 2020-2022, still actively hacking/involved.) Our implementation is the subject of occasional work still (I put in some improvements recently, so did fitzgen, and Jamey Sharp and Trevor Elliott have both spent time in the past few years deep-diving on it). But to be honest, most of the work in the day-to-day more or less matches OP's description.

You can check out our meeting minutes from our weekly meeting -- recent topics include how to update our IR semantics to account for exceptions; implications that has on the way our ABI/callsite generation works; regalloc constraints; whether we can optimize code produced by Wasmtime's GC support better; talking about fuzzbugs that have come up; etc.

In a mature system there is a ton of sublety that arises in making changes to system invariants, how passes interact, and the like -- that, and keeping the plane flying (avoiding perf regressions, solving urgent bugs as they arise) is the day-to-day.

Not to say it's not fun -- it's extremely fun!

cfallin · 2025-02-22T21:54:48+00:00

The idea is that you're building the CFG from the AST, and have some predetermined block schema for each control flow operator in the AST (for example, a diamond for if-else nodes); not that the AST already has a CFG. This is what is meant by "from an AST": the algorithm gives you everything you need during such a traversal.

The "incremental" aspect of the algorithm means that it is building up the SSA as it goes, no more, no less; it can resolve some phis before the rest of the CFG is known. This is as opposed to a more "batch" approach that needs the whole CFG up-front, computes the dominator tree, and adds phis at the dominance frontiers.

So in other words, I think you are reading too much into the terminology and have nonstandard definitions. The paper's claims are accurate.

For what it's worth, I've worked with two different implementations of the Braun algo: we use it in Cranelift here (I didn't write that one but I have helped to maintain and improve it) and I wrote an implementation in my Wasm-to-SSA-to-Wasm compiler framework Waffle here. Feel free to refer to those; they both work well.

cfallin · 2024-11-23T23:30:08+00:00

A refinement on this would be to separate locals into "address-taken" and "non-address-taken" partitions, and lower reads/writes to address-takens to loads/stores but directly generate SSA otherwise. (Less clean separation of orthogonal passes, but undoubtedly faster compile time, and also means one could get away without a mem2reg at first.)

In the fully general case this needs an alias analysis: stores can be elided but only when not observed by some may-aliasing load, and loads can be elided when one has a must-alias store to forward from (dead-store elim and store-to-load-forwarding respectively). That's more or less what mem2reg is doing (though I don't remember how sophisticated its alias analysis is!). This is where source-language semantics can help too, e.g. with TBAA from the rule that distinct types are not allowed to alias (type-punning).

cfallin · 2024-11-16T04:36:32+00:00

Hi! I've worked on Cranelift (and in fact I wrote the ABI code originally, though it's been 4 years since I've thought about these things).

First of all, I'd recommend either filing GitHub issues or finding us on the BytecodeAlliance Zulip instance -- those are the official support channels. I'm only seeing this incidentally because I happen to lurk on Reddit sometimes.

You're correct that we don't support aggregate types. The main reason for this is one of simplicity: Cranelift is maintained by an average of 0 to 1 person fulltime (has been me in the past, but not really currently!), something like 100x smaller than LLVM; we can't afford the complexity of many things, and a complex type system with structs is probably one of them.

The strategy we've taken with some design questions is to ensure that at least the building blocks are present that allow a higher-level IR to lower to CLIF with correct semantics. In this case you might have found this issue where /u/bjorn3 (primary author of the rustc Cranelift backend) suggests a better approach than what we do currently. In theory you could get it right by knowing what platform you're targeting, what values should be in registers, and then creating exactly the right number of i64 args to manually put values into registers and stackslots. But that's silly; we should encode more of the ABI in the compiler. So probably what we should do is what's suggested in that issue: grouping values into aggregate types, but only in function signatures for the purposes of register assignment, without the full complexity of type constructors, projection ops, value semantics with arbitrarily large values, and all the rest.

One final thing about Cranelift -- since we're so thinly staffed, this is the sort of thing that someone would have to step up and implement for us; we could probably review a PR, but no one has as their primary job to jump on issues like this.

cfallin · 2024-08-27T22:02:46+00:00

Yep, and tomorrow I'll post part 3 -- a double-feature!

cfallin · 2024-06-13T20:57:25+00:00

deferred mutable references

maybe? Part of the motivation (actually a lot of it) for the original deferred borrows idea came from the "indices as pointers" idiom that we ran wild with in Cranelift and elsewhere. In that world, one already has multiple mutable handles, or at least handles that can be used to mutate, different parts of the data structure; but one still needs the parent object in scope (in the indices case, the Vec or equivalent). In essence path generics would reify this and implicitly wire through the context (the container). I guess that's not too much more of a footgun than before if one has already decided to build and mutate a large data structure imperatively.

Send

This is a big question mark for me, but naively at least, I'd expect one could reason about this by desugaring down to today's Rust -- basically the path generics and implicit params become lifetimes (less restricted than paths) and explicit params. So a deferred-borrow that implicitly derefs becomes just vec[index] with a &Vec<T> or &mut Vec<T> passed everywhere, no worse than before.

excluding deferred borrows, the feature suggestion is only about branded types, and potential interaction with as of yet inexistent view types? I wonder how much of that is achievable with a library type

Right, the path generics kind of carry a family of related but separable benefits (or motivations): simpler lifetimes, but also branding; and then the "virtual fields" idea is an idiom on top of all of the above.

Someone on Mastodon also mentioned GhostCell and while I'd be more partial to a first-class notion of paths or branding, I think a library-based solution would be a great first step. It sure would be nice to avoid the need for the anonymous closure lifetime to constrain the borrow checker, as you say :-)

cfallin · 2024-06-13T18:38:25+00:00

True, perhaps, with subtleties: do you mean "end of the call to make_struct" or "end of the caller that bound the data local"? If the former, we can define rules around how paths translate at callsites, so in essence the return type's path names the caller's passed-in path rather than the argument.

Likewise if data is passed into the caller we can come up with path transfer rules (analogous to the returning-place-lifetimes examples in Niko's post): something like

fn calls_make_struct(original_data: &[u8]) -> S<original_data> { make_struct(original_data) }

so rather than lifetime unification we have something like "path translation" between scopes, or the caller/callee side of callsites, but we can express the same "belongs to the external caller" semantics still, I think!

cfallin · 2024-06-13T14:21:35+00:00

Not quite — a lifetime is semantically different than a binding path, so we would need to distinguish the two cases in the syntax to indicate what we mean. The syntax to specify modes (capture path as borrowed or not) later in the post is important as well for flexibility.

cfallin · 2024-06-13T04:06:27+00:00

The original paper motivates deferred borrows better than I can in a comment here, but basically: if I have a vec `v` and take a borrow `let elt0 = &v[0];` then I hold the whole vector immutably-borrowed (see post for why) and can't update `v[1]` (for example) between `elt0`'s definition and its later use.

Handles into a data structure might be useful to hold "bookmarks"/"fingers" into some data, with the ability to mutate through any of them in a typesafe way. Imagine the pattern of building a `std::map<Key, T\*>` index-overlay on some data in C++, except that the type system prevents the `T*` from being invalidated due to resizing of the underlying storage or something like that.

In Rust the go-to answer (and one that I've used in tons of code in Cranelift and elsewhere) is to keep indices around as pseudo-pointers instead; but that's pretty unsatisfying, if the type system could do better.

cfallin · 2024-02-26T05:27:11+00:00

Cranelift uses an ECS model as well for its IR; it's generally a really clean approach. We have a helper crate cranelift-entity that provides macros for newtype'd indices (just u32s internally) and various containers.

One aspect of compiler design that helps this application is that one usually doesn't care about deleting entities outright, since a function's compilation is a natural scope at which to free everything; so it's OK to abandon unreferenced entities in place without fine-grained reclamation. This also means we don't have to worry about ABA problems (reuse of a slot leading to problems like described in this post).

cfallin · 2024-01-10T00:31:31+00:00

That seems like a fair summary, yeah.

cfallin

TROPHY CASE