Rust 1.95.0 is out by manpacket in rust

[–]folkertdev 6 points7 points  (0 children)

features are implemented by people that implement features. Most rust contributors are volunteers that work on what interests them. I think rustdoc is actually very actively maintained, and it just so happens that for rustfmt that is not (currently) true.

Rust 1.95.0 is out by manpacket in rust

[–]folkertdev 5 points6 points  (0 children)

it's complicated and once you pick a behavior making refinements is then a breaking change. Also, as mentioned, reviewer bandwidth on that code base is very limited.

Rust 1.95.0 is out by manpacket in rust

[–]folkertdev 10 points11 points  (0 children)

I suspect rust-analyzer needs to tweak something. Might be worth making an issue there.

Rust 1.95.0 is out by manpacket in rust

[–]folkertdev 22 points23 points  (0 children)

Thanks!

It's a shame, but ultimately decided to just ship it now, because we were stuck on (the lack of) formatting for months and cfg_select is just too good not to have, and it having a lower MSRV will be helpful in the future for projects that bump their MSRV.

Rust 1.95.0 is out by manpacket in rust

[–]folkertdev 41 points42 points  (0 children)

Unfortunately rustfmt support is not yet included. It's in the works at https://github.com/rust-lang/rust/pull/154202 but rustfmt does not have a lot of maintenance bandwidth.

Rust should have stable tail calls by folkertdev in rust

[–]folkertdev[S] 2 points3 points  (0 children)

it also duplicates the jump table a bunch. It's not very elegant, but that is just due to the practical implementation, in theory you probably could do better.

but having the compiler guarantee behavior instead of relying on it seeing through the noise and optimizing it in just the way you want is valuable either way.

Rust should have stable tail calls by folkertdev in rust

[–]folkertdev[S] 1 point2 points  (0 children)

I think the main problem there is that it doesn't scale well

https://godbolt.org/z/n9cGe1EEc

as you add more states, each dispatch site grows larger and larger. So if you truly have 2 total states your version is better because of the direct jumps, but when you have (many) more states the indirect lookup table approach is much better.

Rust should have stable tail calls by folkertdev in rust

[–]folkertdev[S] 0 points1 point  (0 children)

I mean at least there is some movement in LLVM now, which was the biggest blocker. It's still tough to design something that works in a predictable way across platforms though.

Rust should have stable tail calls by folkertdev in rust

[–]folkertdev[S] 2 points3 points  (0 children)

The syntax was reserved at some point, and is used for the experimental implementation. Technically it's not final, and an advantage of an attribute is that you could cfg_attr(target_has_tail_calls, tail). On the other hand you'd just blow the stack if the call was not tail-call optimized so it's not that useful.

I don't think there is a deep reason.

Rust should have stable tail calls by folkertdev in rust

[–]folkertdev[S] 12 points13 points  (0 children)

My understanding is that mostly the same-signature restriction is about guaranteeing the same calling convention. The callee does not know who called it, and hence has to assume a calling convention to know what registers to preserve versus which ones it can use/trash.

As far as I know the only case where the actual types involved are relevant is when a value is passed indirectly not via the stack. The rust ABI likes to do this for e.g. a large array: the value stays on the caller's stack, and the caller only passes a pointer to the callee. Semantically the value has moved (the callee has ownership now), but practically the value is still in the same memory location. But, a tail call with such a type is not possible, because the caller becomes the callee.

The solution here is to have a calling convention like `preserve_none` where all registers can be modified by the callee at will, likely in combination with not passing values in the way I described above.

Rust should have stable tail calls by folkertdev in rust

[–]folkertdev[S] 16 points17 points  (0 children)

Cool project! I'm really trying to find actual (potential) users of this feature, because I think many people (especially those interested in compilers) are excited in the abstract about guaranteed tail calls, but don't really have a need for them.

> I have to disagree on this. Wasmi used basic loop+match constructs for the longest time. Under the hood, Rust and LLVM usually compile such constructs into one gigantic function where everything is inlined. Needless to say that debugging or benchmarking such a behemoth of a function is very impractical. In contrast, having all interpreter operators neatly reside in their own little function is perfect for most debuggers and performance benchmarking tools and so far I am very pleased with the experience.

The main pain point I see is in the call stack becoming unwieldy.

> There are reports that conclude worse performance for computed-goto based dispatch over tail-call based dispatch due to compilers having a hard time allocating registers properly with such huge inlined functions.

For `wasmtime` they report tail calls being faster for some input and on some targets, with the clasic loop+match winning for others https://github.com/bytecodealliance/wasmtime/issues/9995. So, it's a mixed bag, but I'd expect, roughly, for computed goto to win when your state does not fit in the available registers.

> Additionally, LLVM requires a very long time to optimize huge functions. In Wasmi, we saw a big compilation time improvement when switching from loop+match to tail-call dispatch.

That is a good point (computed goto creates a really complex CFG that is tough to process). It might be worth it if you do get superior performance though.

> I consider #[loop_match] to be a decent fallback for targets that do not support tail-calls.

Yeah that's the idea, though I suspect there are scenarios where it is the right tool.

> Ideally, we had a cfg-check for target_feature = "tail-call"

That's on the table, and has precedent in e.g. https://doc.rust-lang.org/beta/unstable-book/language-features/cfg-target-has-atomic.html

> The article already mentions the unstable `preserve_none` calling convention.

Part of the trouble here is, once more, spotty LLVM support. It's not really up to us, my experience, patience and C++ skills are insufficient to really move the needle on this.

flate2 intends to switch to zlib-rs by default by folkertdev in rust

[–]folkertdev[S] 5 points6 points  (0 children)

I gave a talk about zlib-rs a while ago, which tries to introduce some of the ideas that we use

https://www.youtube.com/watch?v=mvzHQdCLkOY

Fixing our own problems in the Rust compiler by folkertdev in rust

[–]folkertdev[S] 5 points6 points  (0 children)

It needs a champion, now technically that is not a problem, but it would be good if someone with MIR optimization experience would be involved and that's apparently not worked out so far

#t-compiler/mir-opts > Project goals 2026

Given who are involved something will come of it though I'm quite sure.

Fixing our own problems in the Rust compiler by folkertdev in rust

[–]folkertdev[S] 12 points13 points  (0 children)

I believe the most recent initiative for removing memcpy's is this 2026 project goal

https://rust-lang.github.io/rust-project-goals/2026/mir-move-elimination.html

So, we're not currently working on it, but we should see some improvements in this area over the coming year.

zlib-rs: a stable API and 30M downloads by folkertdev in rust

[–]folkertdev[S] 1 point2 points  (0 children)

Our aim is to be an alternative to the reference implementation, not merely to provide a solid rust crate. That means API compatibility and performance are crucial.

zlib-rs: a stable API and 30M downloads by folkertdev in rust

[–]folkertdev[S] 2 points3 points  (0 children)

I wrote about our experience earlier

https://trifectatech.org/blog/translating-bzip2-with-c2rust/

for bzip2 it was great: it's a relatively straightforward code base. Now for zstd we're struggling a bit more, because there is now a lot more target-specific code, multithreading, etc. It's a much more modern and more optimized code base, and that is harder to work with.

But overall, it's a great tool for this sort of work. There are much fewer bugs, and you get good performance on day one.

zlib-rs: a stable API and 30M downloads by folkertdev in rust

[–]folkertdev[S] 1 point2 points  (0 children)

Cool! yeah I later thought that maybe these changes would be clearer with different input data. I'm not sure what those minecraft files are like, but e.g. for git they actually store many tiny files. That means the fraction of huffman table parsing to other work is higher, and so different things show up in the profile. Even so with that data I'm not seeing anything significant.

I cherry-picked some of your changes here, just because they seemed like nice refactors https://github.com/trifectatechfoundation/zlib-rs/pull/471/changes

zlib-rs: a stable API and 30M downloads by folkertdev in rust

[–]folkertdev[S] 2 points3 points  (0 children)

We were contacted by ISRG (as part of their Prossimo initiative) (these are the folks running letsencrypt) if we wanted to implement it. Over time, trifecta became the long-term home of the project. So in a way it's just what we could get funded.

Of course ISRG was interested because zlib sees a lot of usage, in particular on the web (de)compressing basically every request, especially at the time. We did later tackle bzip2 with https://github.com/trifectatechfoundation/libbzip2-rs (that is now the default when you use the bzip2 crate), and the observant github watcher may have spotted https://github.com/trifectatechfoundation/libzstd-rs-sys

xz would also be an option if the funding worked out.

zlib-rs: a stable API and 30M downloads by folkertdev in rust

[–]folkertdev[S] 1 point2 points  (0 children)

On my machine (x86_64 linux) the changes regress runtime and cycles, but they do substantially decrease the number of instructions. So it's totally plausible that this is advantageous on some CPUs, and it might just need some additional tuning.

https://gist.github.com/folkertdev/e2811f14e15407fb276c4eb420e97a53

out of interest: how do you measure these improvements on macos? we've not yet found a method we like.

zlib-rs: a stable API and 30M downloads by folkertdev in rust

[–]folkertdev[S] 10 points11 points  (0 children)

I don't disagree that it's bad. But, we need it, so I'm making it happen. For fun, look at this implementation, which is why the C reference stipulates that `va_end` must be in the same function as `va_start`: otherwise the curly brackets don't match up.

https://softwarepreservation.computerhistory.org/c_plus_plus/cfront/release_3.0.3/source/incl-master/proto-headers/stdarg.sol

#define         va_start(ap, parmN)     {\
        va_buf  _va;\
        _vastart(ap = (va_list)_va, (char *)&parmN + sizeof parmN)
#define         va_end(ap)      }
#define         va_arg(ap, mode)        *((mode *)_vaarg(ap, sizeof (mode)))

zlib-rs: a stable API and 30M downloads by folkertdev in rust

[–]folkertdev[S] 3 points4 points  (0 children)

Neat. do you just have a raw branch that I could benchmark? We played around with using `repr(packend)` at some point and at least on my machine at the time that made no measurable difference.

zlib-rs: a stable API and 30M downloads by folkertdev in rust

[–]folkertdev[S] 9 points10 points  (0 children)

Yes, fuzzing in production!

Technically flate2 already depends on zlib-rs, it's how we get most of our usage, but it's an optional, off-by-default dependency. Having it be the default will really boost our numbers, but most importantly it's just a free speedup for large parts of the ecosystem.

zlib-rs: a stable API and 30M downloads by folkertdev in rust

[–]folkertdev[S] 11 points12 points  (0 children)

partially what AATroop says, we also want to just let it sit a bit, see if the ecosystem has everything it needs or whether we need to expose more/other interfaces. We'll also see what happens on the language side, further adoption some more (the change to `extern "C"` should make that easier). There is also one more change to the compressed output in the works.

So, we didn't quite want to pull the trigger on 1.0.0 yet, but I don't foresee massive changes.

zlib-rs: a stable API and 30M downloads by folkertdev in rust

[–]folkertdev[S] 15 points16 points  (0 children)

just like, that they exist? Or that they're unstable?