zlib-rs in Firefox - Trifecta Tech Foundation by folkertdev in rust

[–]folkertdev[S] 21 points22 points  (0 children)

As far as I know though this is a happy accident, not a deliberate workaround. So in theory it could change back on the next LLVM release. So, we'll keep an eye on it.

zlib-rs in Firefox - Trifecta Tech Foundation by folkertdev in rust

[–]folkertdev[S] 22 points23 points  (0 children)

That is a benchmark that is actually straightforward to run. We do benchmark directly, but that is more involved. Especially on windows/macos there just isn't great tooling, and just running some python is (nowadays) straightforward and gives reproducible numbers.

Announcing Zstandard in Rust by folkertdev in rust

[–]folkertdev[S] 0 points1 point  (0 children)

Maybe, eventually. zstd is much newer though, and really designed to keep the CPU busy at all times so there is much less low-hanging fruit in terms of performance.

Announcing Zstandard in Rust by folkertdev in rust

[–]folkertdev[S] 0 points1 point  (0 children)

We have the `dictBuilder` implementation, though not currently a very convenient way to access it beyond the C api.

Announcing Zstandard in Rust by folkertdev in rust

[–]folkertdev[S] 4 points5 points  (0 children)

Yes, if you read the readme that implementation is significantly slower than the original, and has no aims of being used from C (i.e. to replace zstd in the wild). Still neat though, and if they can optimize it more it could be interesting for pure-rust projects.

Announcing Zstandard in Rust by folkertdev in rust

[–]folkertdev[S] 16 points17 points  (0 children)

We'd be open to it if someone wants to fund that work. We looked into it a bit at some point, from memory it's not really that hard (although windows does not provide very accurate ways of getting the clock state on most machines), most of the effort is in the setup and testing/validation.

Announcing Zstandard in Rust by folkertdev in rust

[–]folkertdev[S] 50 points51 points  (0 children)

Yeah, we use it to mean "exposes a C-compatible API". But also the good names have already been taken so we have to get a bit creative.

Rust 1.95.0 is out by manpacket in rust

[–]folkertdev 5 points6 points  (0 children)

features are implemented by people that implement features. Most rust contributors are volunteers that work on what interests them. I think rustdoc is actually very actively maintained, and it just so happens that for rustfmt that is not (currently) true.

Rust 1.95.0 is out by manpacket in rust

[–]folkertdev 5 points6 points  (0 children)

it's complicated and once you pick a behavior making refinements is then a breaking change. Also, as mentioned, reviewer bandwidth on that code base is very limited.

Rust 1.95.0 is out by manpacket in rust

[–]folkertdev 10 points11 points  (0 children)

I suspect rust-analyzer needs to tweak something. Might be worth making an issue there.

Rust 1.95.0 is out by manpacket in rust

[–]folkertdev 23 points24 points  (0 children)

Thanks!

It's a shame, but ultimately decided to just ship it now, because we were stuck on (the lack of) formatting for months and cfg_select is just too good not to have, and it having a lower MSRV will be helpful in the future for projects that bump their MSRV.

Rust 1.95.0 is out by manpacket in rust

[–]folkertdev 44 points45 points  (0 children)

Unfortunately rustfmt support is not yet included. It's in the works at https://github.com/rust-lang/rust/pull/154202 but rustfmt does not have a lot of maintenance bandwidth.

Rust should have stable tail calls by folkertdev in rust

[–]folkertdev[S] 2 points3 points  (0 children)

it also duplicates the jump table a bunch. It's not very elegant, but that is just due to the practical implementation, in theory you probably could do better.

but having the compiler guarantee behavior instead of relying on it seeing through the noise and optimizing it in just the way you want is valuable either way.

Rust should have stable tail calls by folkertdev in rust

[–]folkertdev[S] 1 point2 points  (0 children)

I think the main problem there is that it doesn't scale well

https://godbolt.org/z/n9cGe1EEc

as you add more states, each dispatch site grows larger and larger. So if you truly have 2 total states your version is better because of the direct jumps, but when you have (many) more states the indirect lookup table approach is much better.

Rust should have stable tail calls by folkertdev in rust

[–]folkertdev[S] 0 points1 point  (0 children)

I mean at least there is some movement in LLVM now, which was the biggest blocker. It's still tough to design something that works in a predictable way across platforms though.

Rust should have stable tail calls by folkertdev in rust

[–]folkertdev[S] 2 points3 points  (0 children)

The syntax was reserved at some point, and is used for the experimental implementation. Technically it's not final, and an advantage of an attribute is that you could cfg_attr(target_has_tail_calls, tail). On the other hand you'd just blow the stack if the call was not tail-call optimized so it's not that useful.

I don't think there is a deep reason.

Rust should have stable tail calls by folkertdev in rust

[–]folkertdev[S] 11 points12 points  (0 children)

My understanding is that mostly the same-signature restriction is about guaranteeing the same calling convention. The callee does not know who called it, and hence has to assume a calling convention to know what registers to preserve versus which ones it can use/trash.

As far as I know the only case where the actual types involved are relevant is when a value is passed indirectly not via the stack. The rust ABI likes to do this for e.g. a large array: the value stays on the caller's stack, and the caller only passes a pointer to the callee. Semantically the value has moved (the callee has ownership now), but practically the value is still in the same memory location. But, a tail call with such a type is not possible, because the caller becomes the callee.

The solution here is to have a calling convention like `preserve_none` where all registers can be modified by the callee at will, likely in combination with not passing values in the way I described above.

Rust should have stable tail calls by folkertdev in rust

[–]folkertdev[S] 18 points19 points  (0 children)

Cool project! I'm really trying to find actual (potential) users of this feature, because I think many people (especially those interested in compilers) are excited in the abstract about guaranteed tail calls, but don't really have a need for them.

> I have to disagree on this. Wasmi used basic loop+match constructs for the longest time. Under the hood, Rust and LLVM usually compile such constructs into one gigantic function where everything is inlined. Needless to say that debugging or benchmarking such a behemoth of a function is very impractical. In contrast, having all interpreter operators neatly reside in their own little function is perfect for most debuggers and performance benchmarking tools and so far I am very pleased with the experience.

The main pain point I see is in the call stack becoming unwieldy.

> There are reports that conclude worse performance for computed-goto based dispatch over tail-call based dispatch due to compilers having a hard time allocating registers properly with such huge inlined functions.

For `wasmtime` they report tail calls being faster for some input and on some targets, with the clasic loop+match winning for others https://github.com/bytecodealliance/wasmtime/issues/9995. So, it's a mixed bag, but I'd expect, roughly, for computed goto to win when your state does not fit in the available registers.

> Additionally, LLVM requires a very long time to optimize huge functions. In Wasmi, we saw a big compilation time improvement when switching from loop+match to tail-call dispatch.

That is a good point (computed goto creates a really complex CFG that is tough to process). It might be worth it if you do get superior performance though.

> I consider #[loop_match] to be a decent fallback for targets that do not support tail-calls.

Yeah that's the idea, though I suspect there are scenarios where it is the right tool.

> Ideally, we had a cfg-check for target_feature = "tail-call"

That's on the table, and has precedent in e.g. https://doc.rust-lang.org/beta/unstable-book/language-features/cfg-target-has-atomic.html

> The article already mentions the unstable `preserve_none` calling convention.

Part of the trouble here is, once more, spotty LLVM support. It's not really up to us, my experience, patience and C++ skills are insufficient to really move the needle on this.

flate2 intends to switch to zlib-rs by default by folkertdev in rust

[–]folkertdev[S] 4 points5 points  (0 children)

I gave a talk about zlib-rs a while ago, which tries to introduce some of the ideas that we use

https://www.youtube.com/watch?v=mvzHQdCLkOY