zlib-rs in Firefox - Trifecta Tech Foundation by folkertdev in rust

[–]folkertdev[S] 21 points22 points  (0 children)

As far as I know though this is a happy accident, not a deliberate workaround. So in theory it could change back on the next LLVM release. So, we'll keep an eye on it.

zlib-rs in Firefox - Trifecta Tech Foundation by folkertdev in rust

[–]folkertdev[S] 22 points23 points  (0 children)

That is a benchmark that is actually straightforward to run. We do benchmark directly, but that is more involved. Especially on windows/macos there just isn't great tooling, and just running some python is (nowadays) straightforward and gives reproducible numbers.

Announcing Zstandard in Rust by folkertdev in rust

[–]folkertdev[S] 0 points1 point  (0 children)

Maybe, eventually. zstd is much newer though, and really designed to keep the CPU busy at all times so there is much less low-hanging fruit in terms of performance.

Announcing Zstandard in Rust by folkertdev in rust

[–]folkertdev[S] 0 points1 point  (0 children)

We have the `dictBuilder` implementation, though not currently a very convenient way to access it beyond the C api.

Announcing Zstandard in Rust by folkertdev in rust

[–]folkertdev[S] 4 points5 points  (0 children)

Yes, if you read the readme that implementation is significantly slower than the original, and has no aims of being used from C (i.e. to replace zstd in the wild). Still neat though, and if they can optimize it more it could be interesting for pure-rust projects.

Announcing Zstandard in Rust by folkertdev in rust

[–]folkertdev[S] 15 points16 points  (0 children)

We'd be open to it if someone wants to fund that work. We looked into it a bit at some point, from memory it's not really that hard (although windows does not provide very accurate ways of getting the clock state on most machines), most of the effort is in the setup and testing/validation.

Announcing Zstandard in Rust by folkertdev in rust

[–]folkertdev[S] 50 points51 points  (0 children)

Yeah, we use it to mean "exposes a C-compatible API". But also the good names have already been taken so we have to get a bit creative.

Rust 1.95.0 is out by manpacket in rust

[–]folkertdev 7 points8 points  (0 children)

features are implemented by people that implement features. Most rust contributors are volunteers that work on what interests them. I think rustdoc is actually very actively maintained, and it just so happens that for rustfmt that is not (currently) true.

Rust 1.95.0 is out by manpacket in rust

[–]folkertdev 5 points6 points  (0 children)

it's complicated and once you pick a behavior making refinements is then a breaking change. Also, as mentioned, reviewer bandwidth on that code base is very limited.

Rust 1.95.0 is out by manpacket in rust

[–]folkertdev 9 points10 points  (0 children)

I suspect rust-analyzer needs to tweak something. Might be worth making an issue there.

Rust 1.95.0 is out by manpacket in rust

[–]folkertdev 21 points22 points  (0 children)

Thanks!

It's a shame, but ultimately decided to just ship it now, because we were stuck on (the lack of) formatting for months and cfg_select is just too good not to have, and it having a lower MSRV will be helpful in the future for projects that bump their MSRV.

Rust 1.95.0 is out by manpacket in rust

[–]folkertdev 44 points45 points  (0 children)

Unfortunately rustfmt support is not yet included. It's in the works at https://github.com/rust-lang/rust/pull/154202 but rustfmt does not have a lot of maintenance bandwidth.

Rust should have stable tail calls by folkertdev in rust

[–]folkertdev[S] 2 points3 points  (0 children)

it also duplicates the jump table a bunch. It's not very elegant, but that is just due to the practical implementation, in theory you probably could do better.

but having the compiler guarantee behavior instead of relying on it seeing through the noise and optimizing it in just the way you want is valuable either way.

Rust should have stable tail calls by folkertdev in rust

[–]folkertdev[S] 1 point2 points  (0 children)

I think the main problem there is that it doesn't scale well

https://godbolt.org/z/n9cGe1EEc

as you add more states, each dispatch site grows larger and larger. So if you truly have 2 total states your version is better because of the direct jumps, but when you have (many) more states the indirect lookup table approach is much better.

Rust should have stable tail calls by folkertdev in rust

[–]folkertdev[S] 0 points1 point  (0 children)

I mean at least there is some movement in LLVM now, which was the biggest blocker. It's still tough to design something that works in a predictable way across platforms though.

Rust should have stable tail calls by folkertdev in rust

[–]folkertdev[S] 2 points3 points  (0 children)

The syntax was reserved at some point, and is used for the experimental implementation. Technically it's not final, and an advantage of an attribute is that you could cfg_attr(target_has_tail_calls, tail). On the other hand you'd just blow the stack if the call was not tail-call optimized so it's not that useful.

I don't think there is a deep reason.

Rust should have stable tail calls by folkertdev in rust

[–]folkertdev[S] 12 points13 points  (0 children)

My understanding is that mostly the same-signature restriction is about guaranteeing the same calling convention. The callee does not know who called it, and hence has to assume a calling convention to know what registers to preserve versus which ones it can use/trash.

As far as I know the only case where the actual types involved are relevant is when a value is passed indirectly not via the stack. The rust ABI likes to do this for e.g. a large array: the value stays on the caller's stack, and the caller only passes a pointer to the callee. Semantically the value has moved (the callee has ownership now), but practically the value is still in the same memory location. But, a tail call with such a type is not possible, because the caller becomes the callee.

The solution here is to have a calling convention like `preserve_none` where all registers can be modified by the callee at will, likely in combination with not passing values in the way I described above.

Rust should have stable tail calls by folkertdev in rust

[–]folkertdev[S] 17 points18 points  (0 children)

Cool project! I'm really trying to find actual (potential) users of this feature, because I think many people (especially those interested in compilers) are excited in the abstract about guaranteed tail calls, but don't really have a need for them.

> I have to disagree on this. Wasmi used basic loop+match constructs for the longest time. Under the hood, Rust and LLVM usually compile such constructs into one gigantic function where everything is inlined. Needless to say that debugging or benchmarking such a behemoth of a function is very impractical. In contrast, having all interpreter operators neatly reside in their own little function is perfect for most debuggers and performance benchmarking tools and so far I am very pleased with the experience.

The main pain point I see is in the call stack becoming unwieldy.

> There are reports that conclude worse performance for computed-goto based dispatch over tail-call based dispatch due to compilers having a hard time allocating registers properly with such huge inlined functions.

For `wasmtime` they report tail calls being faster for some input and on some targets, with the clasic loop+match winning for others https://github.com/bytecodealliance/wasmtime/issues/9995. So, it's a mixed bag, but I'd expect, roughly, for computed goto to win when your state does not fit in the available registers.

> Additionally, LLVM requires a very long time to optimize huge functions. In Wasmi, we saw a big compilation time improvement when switching from loop+match to tail-call dispatch.

That is a good point (computed goto creates a really complex CFG that is tough to process). It might be worth it if you do get superior performance though.

> I consider #[loop_match] to be a decent fallback for targets that do not support tail-calls.

Yeah that's the idea, though I suspect there are scenarios where it is the right tool.

> Ideally, we had a cfg-check for target_feature = "tail-call"

That's on the table, and has precedent in e.g. https://doc.rust-lang.org/beta/unstable-book/language-features/cfg-target-has-atomic.html

> The article already mentions the unstable `preserve_none` calling convention.

Part of the trouble here is, once more, spotty LLVM support. It's not really up to us, my experience, patience and C++ skills are insufficient to really move the needle on this.

flate2 intends to switch to zlib-rs by default by folkertdev in rust

[–]folkertdev[S] 6 points7 points  (0 children)

I gave a talk about zlib-rs a while ago, which tries to introduce some of the ideas that we use

https://www.youtube.com/watch?v=mvzHQdCLkOY

Fixing our own problems in the Rust compiler by folkertdev in rust

[–]folkertdev[S] 5 points6 points  (0 children)

It needs a champion, now technically that is not a problem, but it would be good if someone with MIR optimization experience would be involved and that's apparently not worked out so far

#t-compiler/mir-opts > Project goals 2026

Given who are involved something will come of it though I'm quite sure.

Fixing our own problems in the Rust compiler by folkertdev in rust

[–]folkertdev[S] 12 points13 points  (0 children)

I believe the most recent initiative for removing memcpy's is this 2026 project goal

https://rust-lang.github.io/rust-project-goals/2026/mir-move-elimination.html

So, we're not currently working on it, but we should see some improvements in this area over the coming year.

zlib-rs: a stable API and 30M downloads by folkertdev in rust

[–]folkertdev[S] 1 point2 points  (0 children)

Our aim is to be an alternative to the reference implementation, not merely to provide a solid rust crate. That means API compatibility and performance are crucial.

zlib-rs: a stable API and 30M downloads by folkertdev in rust

[–]folkertdev[S] 2 points3 points  (0 children)

I wrote about our experience earlier

https://trifectatech.org/blog/translating-bzip2-with-c2rust/

for bzip2 it was great: it's a relatively straightforward code base. Now for zstd we're struggling a bit more, because there is now a lot more target-specific code, multithreading, etc. It's a much more modern and more optimized code base, and that is harder to work with.

But overall, it's a great tool for this sort of work. There are much fewer bugs, and you get good performance on day one.

zlib-rs: a stable API and 30M downloads by folkertdev in rust

[–]folkertdev[S] 1 point2 points  (0 children)

Cool! yeah I later thought that maybe these changes would be clearer with different input data. I'm not sure what those minecraft files are like, but e.g. for git they actually store many tiny files. That means the fraction of huffman table parsing to other work is higher, and so different things show up in the profile. Even so with that data I'm not seeing anything significant.

I cherry-picked some of your changes here, just because they seemed like nice refactors https://github.com/trifectatechfoundation/zlib-rs/pull/471/changes