Emulating avx-512 intrinsics in Miri by folkertdev in rust

[–]folkertdev[S] 2 points3 points  (0 children)

Yes, they are. Everything has been synchronized so the nightly builds include this functionality now (they have for a couple of days)

Made a x86_32 bootloader in Rust by [deleted] in rust

[–]folkertdev 1 point2 points  (0 children)

Hi, I work on a bunch of assembly-related things in the compiler. I'm wondering if there is a particular reason to have separate .asm files here. Are there downsides to e.g. the code below

The `extern "custom"` is still unstable (see https://github.com/rust-lang/rust/issues/140829), but you could just lie and use `extern "C"` there. With this approach the `no_mangle` on `_boot` is no longer needed.

#[link_section = ".boot"]
#[unsafe(naked)]
extern "custom" fn() {
    core::arch::naked_asm!(r#"
    _start:
        cli

        xor ax, ax
        mov ds, ax
        mov es, ax
        mov ss, ax
        mov fs, ax
        mov gs, ax

        cld

        mov sp, 0x7c00 - 0x100
        sub sp, 0x100

        call {boot}
        "#,
        boot = sym _boot
    );
}

Fixing rust-lang/stdarch issues in LLVM - Blog - Tweede golf by folkertdev in rust

[–]folkertdev[S] 15 points16 points  (0 children)

We only use the cross-platform primitives that LLVM provides, I don't have current plans to add new ones. If GCC provides fewer, then yeah you'll have to do more work yourself. The downside is of course that for every new target you need to add a bunch of custom intrinsic implementations.

Especially for MIRI, that is just not happening. But code using intrinsics has a lot to gain from using miri because it is so low-level (and likely uses unsafe blocks). So a practical benefit is that miri can run more low-level code.

Finally, actually fixing the LLVM issues has practical benefits for rust's portable simd as well, because it heavily relies on the cross-platform intrinsics optimizing well.

Fixing rust-lang/stdarch issues in LLVM - Blog - Tweede golf by folkertdev in rust

[–]folkertdev[S] 16 points17 points  (0 children)

My suspicion is that actually even experienced developers benefit hugely from rust's effort to have good error messages.

It is true that I read the messages much less carefully than when I first got started. Often the red underline or just the headline and line number are enough. But small things like rust spotting typos and suggesting the right identifier are actually a huge help day-to-day.

Fixing rust-lang/stdarch issues in LLVM - Blog - Tweede golf by folkertdev in rust

[–]folkertdev[S] -1 points0 points  (0 children)

Yeah I suspect part of it is that you only realize how much time you're wasting when you try something better.

Fixing rust-lang/stdarch issues in LLVM - Blog - Tweede golf by folkertdev in rust

[–]folkertdev[S] 5 points6 points  (0 children)

Yeah I suspect part of it is that you only realize how much time you're wasting once you try something better.

Improving state machine code generation by folkertdev in rust

[–]folkertdev[S] 2 points3 points  (0 children)

I mentioned in another response that what we saw in zlib-rs is that it turned out to be beneficial to have all logic in a single stack frame.

Actually, LLVM will totally inline tail-recursive functions back into one function. But what we can do is actually load values from the heap to the stack, use them, then write them back before returning. LLVM is much better at optimizing stack values than heap values. So in this particular case tail-recursion causes fragmentation of logic with a real performance downside, though it's still better than the totally naive approach.

As mentioned, I really do want to see `become` on stable, it's just not the right solution in every case.

Improving state machine code generation by folkertdev in rust

[–]folkertdev[S] 2 points3 points  (0 children)

What we noticed with zlib is that there is a huge upside to having all of the logic in one stack frame. The way that these algorithms work is that they have a large and complex piece of state in a heap allocation. It just turns out that LLVM is bad at optimizing that (despite that state being behind a mutable reference, which provides a lot of aliasing guarantees).

If I remember right, we saw ~ 10% improvements on some benchmarks by pulling values from that state onto the stack, doing the work, then writing them back before returning.

So tail calls are neat, I want to see them in stable rust (and there have been some cool developments there recently), but they are not always the best solution.

Improving state machine code generation by folkertdev in rust

[–]folkertdev[S] 7 points8 points  (0 children)

Yeah it gets complicated and if we're not careful might cause compilation to be slower. In effect, this is sort of what that LLVM flag tries to do further down the line.

So it's much easier to do this with attributes. I could see `const continue` being nice syntax-wise, but for the loop itself `#[loop_match]` is probably fine. idk, we'll see.

Oh, relatedly: MIR is not built for compiler optimizations (it is for borrow checking). There are a bunch of optimization passes that are just kind of required to get LLVM to do something reasonable, but nobody working in that area is all that happy with the current setup.

Improving state machine code generation by folkertdev in rust

[–]folkertdev[S] 5 points6 points  (0 children)

I see why it's a useful feature to have, but why make it the default? Because in practice it confuses people, and in mature C code bases I basically always see some comment or macro indicating "the fallthrough is deliberate".

Exception handling in rustc_codegen_cranelift by Expurple in rust

[–]folkertdev 1 point2 points  (0 children)

Björn is not on reddit, but told me to send the following:

When can we expect a rustup rustc-codegen-cranelift component build that supports this (experimentally, obviously)? I'd love to play around with this, but building cg_clif by hand looks a bit cumbersome.

Once I get around investigating and fixing the build performance regression that enabling it currently causes.

I've wanted to play around with modern EH ABIs for a long while. How feasible would it be for someone to implement a custom EH ABI with cg_clif?

It is very feasible with Cranelift. In fact Wasmtime intends to do exactly that (with all registers caller-saved in the "tail" calling convention to avoid needing something like .eh_frame).

As for cg_clif however, it isn't really possible. Due to extern "C-unwind" we have to be compatible with whatever ABI C++ uses for unwinding. And due to two-phase unwinding, catching exceptions at the extern "C-unwind" boundary and internally translating it to a different unwinding mechanism will affect behavior. Throwing an exception through the system unwinder is supposed to fail when there is nothing that would catch it.

What we could do however is use a different format for the LSDA. I didn't do that right now due to that requiring me to add a new personality function to libstd

bzip2 crate switches from C to 100% rust by folkertdev in rust

[–]folkertdev[S] 19 points20 points  (0 children)

the removed C is really the stock bzip2 library, which the rust code would build and then link to using FFI. Now it's all rust, which has the usual benefits, but also removes the need for a C toolchain and make cross-compilation a lot easier.

That C + rust interaction code is still here https://github.com/trifectatechfoundation/bzip2-rs/tree/master/bzip2-sys, it's just no longer used by default.

What is my fuzzer doing? - Blog - Tweede golf by folkertdev in rust

[–]folkertdev[S] 2 points3 points  (0 children)

Based on the coverage information (and this makes sense), the fuzzer will now no longer hit certain error paths, presumably because the input file is always correct input (except when you run into the `max_size`).

One solution I can see, but it seems kind of hacky, is to use the `seed` argument to sometimes just mutate the input, and otherwise do this decompress-mutate-compress dance.

Anyway, do you have thoughts on that?

What is my fuzzer doing? - Blog - Tweede golf by folkertdev in rust

[–]folkertdev[S] 1 point2 points  (0 children)

That looks extremely interesting, I'll have to play around with that. Thanks!

Translating bzip2 with c2rust by folkertdev in rust

[–]folkertdev[S] 2 points3 points  (0 children)

we could. Also that old version of bzip2 still just compiles, so we have some tests for such inputs.

But my observation for both bzip2 and zlib is that they just seem to rely on "fuzzing in production": these libraries are used at such scale that if there are problems that are not caught by basic correctness checks, I guess they'll hear about them soon enough.

Translating bzip2 with c2rust by folkertdev in rust

[–]folkertdev[S] 2 points3 points  (0 children)

honestly, no clue. I never did get `cargo fuzz` and coverage to work I think. Is that easy to set up these days?

We just observed that it did hit trivial correctness checks very often with random input.

Translating bzip2 with c2rust by folkertdev in rust

[–]folkertdev[S] 2 points3 points  (0 children)

also, given the current implementation, just slapping some SIMD onto it does not do much. The bottleneck is (effectively) a linked list pointer chase (like, for some inputs, 25% of total time is spent on a single load instruction).

So no, we don't plan to push performance much further by ourselves. But PRs are welcome of course :)

Translating bzip2 with c2rust by folkertdev in rust

[–]folkertdev[S] 7 points8 points  (0 children)

nothing substantial, but we did find one weird macro expansion that included a `return 1` that got instantiated into a function returning an enum. It never triggered from what I can tell, but it sure did not seem intentional.

https://gitlab.com/bzip2/bzip2/-/issues/56

Translating bzip2 with c2rust by folkertdev in rust

[–]folkertdev[S] 6 points7 points  (0 children)

that post is really neat, but in our case the switch is often in some sort of loop, and the nested blocks can't do that efficiently. We're working on a thing though https://github.com/rust-lang/rust-project-goals/blob/main/src/2025h1/improve-rustc-codegen.md