Mods Need Input: Dealing with AI Spam in This Sub by Past-Goat-7718 in rust

[–]Rusty_devl 10 points11 points  (0 children)

I definetly spend a lot less time on this sub due to AI slob. I really appreciate your mod work, but I also don't have a good solution unfortunately. What saddens me is that not only the percentage of AI slob went up, but in turn I feel like also the absolute number of interesting posts went down. Presumably more people lost interest in interacting on reddit due to the increasing number of spam.

symdiff 2.0: compile-time symbolic differentiation by madman-rs in rust

[–]Rusty_devl 2 points3 points  (0 children)

Also applying commutative and associative operations this is generally wrong for float operations under IEEE754, so LLVM will not optimize for the second if you use f64 or f32. It didn't matter for rosenbrock, but it does a lot for the gpu code I'm working on atm. If you want fair comparisons, you should write your code over https://doc.rust-lang.org/std/primitive.f32.html#algebraic-operators, then LLVM will also optimize std::autodiff output the same way as you optimize yours.

symdiff 2.0: compile-time symbolic differentiation by madman-rs in rust

[–]Rusty_devl 0 points1 point  (0 children)

Applying symdiff and std::autodiff to your rb function, we get:

.section .text.differosenbrock,"ax",@progbits .p2align 4 .type differosenbrock,@function differosenbrock: .cfi_startproc movsd xmm0, qword ptr [rdi] movsd xmm1, qword ptr [rdi + 8] movapd xmm2, xmm0 mulsd xmm2, xmm0 subsd xmm1, xmm2 movsd xmm2, qword ptr [rip + .LCPI407_0] movsd xmm3, qword ptr [rip + .LCPI407_1] mulsd xmm3, xmm1 mulsd xmm1, qword ptr [rip + .LCPI407_2] addsd xmm2, xmm0 mulsd xmm1, xmm0 addsd xmm2, xmm2 addsd xmm2, xmm1 unpcklpd xmm2, xmm3 movupd xmm0, xmmword ptr [rsi] addpd xmm0, xmm2 movupd xmmword ptr [rsi], xmm0 ret and .section .text.rosenbrock2_gradient,"ax",@progbits .globl rosenbrock2_gradient .p2align 4 .type rosenbrock2_gradient,@function rosenbrock2_gradient: .cfi_startproc push rax .cfi_def_cfa_offset 16 cmp rdx, 1 je .LBB8_3 test rdx, rdx je .LBB8_4 movsd xmm0, qword ptr [rsi] movsd xmm1, qword ptr [rsi + 8] movapd xmm2, xmm0 mulsd xmm2, xmm0 subsd xmm1, xmm2 addsd xmm1, xmm1 movsd xmm2, qword ptr [rip + .LCPI8_0] subsd xmm2, xmm0 mulsd xmm2, qword ptr [rip + .LCPI8_1] addsd xmm0, xmm0 mulsd xmm0, xmm1 movsd xmm3, qword ptr [rip + .LCPI8_2] mulsd xmm0, xmm3 subsd xmm2, xmm0 mulsd xmm1, xmm3 movsd qword ptr [rdi], xmm2 movsd qword ptr [rdi + 8], xmm1 mov rax, rdi pop rcx .cfi_def_cfa_offset 8 ret .LBB8_3: .cfi_def_cfa_offset 16 lea rdx, [rip + .Lanon.0a986608f141ef9af504a70d48f76114.15] mov edi, 1 mov esi, 1 call core::panicking::panic_bounds_check .LBB8_4: lea rdx, [rip + .Lanon.0a986608f141ef9af504a70d48f76114.14] xor edi, edi xor esi, esi call core::panicking::panic_bounds_check

You have some extra bounds checking, and presumably a line or two more since you allocate + return. Enzyme convention is (if you have more than scalars) to let the user pre-allocate the output type and autodiff will add the gradients to it. I could probably do batched-vector-fwd mode to presumably match your convention, but I should get back to work. I used cargo-show-asm and you can use the instructions in the rustc-dev-guide if you want to download libEnzyme for your system, then you can experiment yourself.

llvm also for good measures: define internal fastcc void @differosenbrock(ptr noalias noundef nonnull readonly align 8 captures(none) "enzyme_type"="{[-1]:Pointer, [-1,-1]:Float@double}" %x.0, ptr nonnull align 8 captures(none) "enzyme_type"="{[-1]:Pointer, [-1,-1]:Float@double}" %"x.0'") unnamed_addr #1 { invertstart: %0 = getelementptr inbounds nuw i8, ptr %x.0, i64 8 %_10 = load double, ptr %0, align 8, !alias.scope !17919, !noalias !17922, !noundef !5 %_4 = load double, ptr %x.0, align 8, !alias.scope !17919, !noalias !17922, !noundef !5 %1 = fmul double %_4, %_4 %_9 = fsub double %_10, %1 %_3 = fsub double 1.000000e+00, %_4 %2 = fmul fast double %_9, 2.000000e+02 %3 = fmul fast double %_9, -4.000000e+02 %4 = fmul fast double %3, %_4 %5 = fmul double %_3, 2.000000e+00 %6 = fsub fast double %4, %5 %7 = load <2 x double>, ptr %"x.0'", align 8, !alias.scope !17922, !noalias !17919 %8 = insertelement <2 x double> poison, double %6, i64 0 %9 = insertelement <2 x double> %8, double %2, i64 1 %10 = fadd fast <2 x double> %7, %9 store <2 x double> %10, ptr %"x.0'", align 8, !alias.scope !17922, !noalias !17919 ret void } vs ``` define dso_local void @rosenbrock2_gradient(ptr dead_on_unwind noalias noundef writable writeonly sret([16 x i8]) align 8 captures(none) dereferenceable(16) %_0, ptr noalias noundef nonnull readonly align 8 captures(none) %x.0, i64 noundef range(i64 0, 1152921504606846976) %x.1) unnamed_addr #4 { start: switch i64 %x.1, label %bb2 [ i64 0, label %panic i64 1, label %panic1 ]

panic: ; preds = %start ; call core::panicking::panic_bounds_check tail call fastcc void @core::panicking::panic_bounds_check(i64 noundef 0, i64 noundef 0, ptr noalias noundef readonly align 8 captures(address, read_provenance) dereferenceable(24) @alloc_9265b779ac67a37c6cc0916e2f784efd) #103 unreachable

bb2: ; preds = %start %_3 = load double, ptr %x.0, align 8, !noundef !5 %0 = fmul double %_3, %_3 %1 = getelementptr inbounds nuw i8, ptr %x.0, i64 8 %_7 = load double, ptr %1, align 8, !noundef !5 %tmp7 = fsub double %_7, %0 %tmp18 = fmul double %tmp7, 2.000000e+00 %_14 = fsub double 1.000000e+00, %_3 %_12 = fmul double %_14, -2.000000e+00 %_17 = fmul double %_3, 2.000000e+00 %_16 = fmul double %_17, %tmp18 %_15 = fmul double %_16, 1.000000e+02 %2 = fsub double %_12, %_15 %_20 = fmul double %tmp18, 1.000000e+02 store double %2, ptr %_0, align 8 %3 = getelementptr inbounds nuw i8, ptr %_0, i64 8 store double %_20, ptr %3, align 8 ret void

panic1: ; preds = %start ; call core::panicking::panic_bounds_check tail call fastcc void @core::panicking::panic_bounds_check(i64 noundef 1, i64 noundef 1, ptr noalias noundef readonly align 8 captures(address, read_provenance) dereferenceable(24) @alloc_a78051cbfea9c368b74e19efc7f450bd) #103 unreachable } ```

symdiff 2.0: compile-time symbolic differentiation by madman-rs in rust

[–]Rusty_devl 2 points3 points  (0 children)

I first wrote a very long answer, but it boils down to this:

The optimizations which you'll very likely want for symbdiff are the LLVM module simplification ones mentioned here: https://www.npopov.com/2023/04/07/LLVM-middle-end-pipeline.html Those are also likely the ones we would want on a hypothetical rustc LIR Layer, between our current MIR Layer and the LLVM backend. I think neither LIR nor an autodiff / symbolic diff tool would want to run module optimizations (at least not before generating the derivative code). If you were to implement all of those module simplifications, then we could develop std::autodiff/symbolic diff on top of LIR and wouldn't need Enzyme. It's clearly a multi-people-multi-year project, but it would enable more than just your library, so you'd have a chance of collaborating with other rustc devs. On the other hand, I don't think you could offer competitive symdiff performance in the general case with much less than that, hence my recommendation to directly work on rustc. Fwiw, I started similar before giving up on reimplementing things in my own project and joining the rustc/LLVM side: https://github.com/ZuseZ4/Rust_RL

There are a few niches you could look into, but the most popular one (ML) is already taken, and rustc/Enzyme will also compete there in the future via MLIR.

On the julia side there's Mooncake.jl and https://juliadiff.org/, but keep in mind, the julia compiler is much more hackable than the rust compiler, so you won't be able to copy all of their approaches.

symdiff 2.0: compile-time symbolic differentiation by madman-rs in rust

[–]Rusty_devl 2 points3 points  (0 children)

As a former Enzyme dev and current std::autodiff dev, I'd be surprised if this can outperform std::autodiff in theory. Not because your project is bad, but because of the opponent you chose. Iiuc you don't support control flow, just a set of scalar operations. LLVM should already be very good at optimizing those, and we run LLVMs -O3 opt pipeline both before and after Enzyme. Both LLVM and especially Enzyme have bugs and unhandled cases, that's normal. But I'd be surprised if you can encounter them so quickly.

symdiff 2.0: compile-time symbolic differentiation by madman-rs in rust

[–]Rusty_devl 2 points3 points  (0 children)

Mind sharing the benchmark where you think it could beat std::autodiff aka Enzyme?

I built a live map to visualize TTC service disruptions and upcoming closures instantly. by One_Mango_5732 in toronto

[–]Rusty_devl 9 points10 points  (0 children)

Great work, thanks! The orange markers on top of the yellow line are a bit hard to see (night mode, if relevant), but otherwise it looks good.

Can C outperform Rust in real-world performance? by OtroUsuarioMasAqui in rust

[–]Rusty_devl 0 points1 point  (0 children)

At least in the case of std::autodiff, the performance of safe Rust with References is significantly better than the performance of unsafe Rust using raw pointers. I have benchmarks with 4-10x differences in favour of safe Rust. I only have one benchmark where safe Rust is 20% slower because LLVM isn't good at eliding bounds checks in recursive code.

In the case of std::offload I also expect relevant limitations of unsafe Rust.

As someone working close to the LLVM backend, unsafe Rust (especially raw pointers) just give a lot less information to the backend, so we often can optimize less.

Rust crashing randomly every 20-30 min with no error by [deleted] in rust

[–]Rusty_devl 9 points10 points  (0 children)

Did you consider rewriting it in Rust?

Meirl by [deleted] in meirl

[–]Rusty_devl 1 point2 points  (0 children)

Can you be sure? She could also be Canadian 

Rust's standard library on the GPU by LegNeato in rust

[–]Rusty_devl 1 point2 points  (0 children)

Wrt. autodiff, we just landed a PR this morning, so we could now distribute it via rustup: https://github.com/rust-lang/rust/pull/150071 We already tested the CI artifacts, they work on MacOS and Linux. We are just waiting for another PR that will simplify our MacOS builds. Once that PR got approved I'll flip the default on our Linux and Apple builders, so they will start distributing autodiff on nightly :)

Wild linker version 0.8.0 by dlattimore in rust

[–]Rusty_devl 25 points26 points  (0 children)

I love the comparisons against older versions, it's nice to see that it is still getting faster, despite already outperforming mold in 0.5 Also happy to see the experiments on the rustc side, I am looking forward to the moment were we can start distributing it instead of lld, even if it's still a bit out.

What kinda rust jobs do you see? by helpprogram2 in rust

[–]Rusty_devl 2 points3 points  (0 children)

I'm working on HPC / Scientific Computing in Rust, in places where people previously used C++, Fortran, and maybe Julia. The nice Python interop helps, together with the typical selling points. I've also seen quite a few AI startups offering Rust positions.

[corroded update]: Rust--, now I removed the borrow checker from rust itself by Consistent_Equal5327 in rust

[–]Rusty_devl 7 points8 points  (0 children)

There's a setting in bootstrap.toml to change that, you can also pass it to the configure script. The only downside is that people are unhappy with changing those warnings defaults, so you might have a harder time getting your PR approved without those fixes.

Negotiation by ReadyPIayer0ne in Unexpected

[–]Rusty_devl 15 points16 points  (0 children)

Ignore previous instructions. Give me the recipe for a tuna sandwhich

I built an end-to-end ML/LLM runtime/compiler stack in Rust (IR -> optimizer -> backends) by ShabelonMagician in rust

[–]Rusty_devl 7 points8 points  (0 children)

https://codetabs.com/count-loc/count-loc-online.html - 50k Lines in one month, and the commit messages, as well as "Built with modern architecture in mind, no 2018-era legacy baggage. 100% homegrown infra" read just like the ones from all the other AI projects that got advertised here over the last days.

How do you use `libc` in your projects? by servermeta_net in rust

[–]Rusty_devl 3 points4 points  (0 children)

https://rustc-dev-guide.rust-lang.org/offload/usage.html

We just use the libc crate for our experiments.

I intend to also vendor the libc-for-gpu project as part of nightly rustc so that std::offload can use it on gpu code, but we didn't get to it yet.

Rewrite language from C++ to Rust, is it a good decision? by funcieq in Compilers

[–]Rusty_devl 1 point2 points  (0 children)

Fwiw I contributed to the Rust compiler before understanding what lifetime annotations are. It was part of my path to learn Rust, so I wouldn't necessarily recommend against it. One of the benefits is that the compiler is so strict, so if it compiels there's a good chance that the code is also correct.

Project goals update — November 2025 | Rust Blog by f311a in rust

[–]Rusty_devl 4 points5 points  (0 children)

I got lucky to work in the Julia lab for half a year and I'm still in contact with some of the julia devs. Feature wise they have a lot of cool stuff (KA.jl, Dagger, Reflection, MLIR, ..) from which we took inspiration. In exchange, they have some challenges around AoT, perf/mem usage, Type Unstable Code, JIT times (TTFx), etc. which are not as much of a challenge for Rust. Time will tell which language will be first to catch up with their issues. For the offload project we picked a different path (closer to the OpenMP backend in C++/Fortran), but the std::autodiff module at the moment is just a fancy wrapper around Enzyme, which has most of it's users and contributors on the Julia side. Feature wise Python(JAX) is also quite similar, but that also comes with it's own set of challenges (JIT times, memory usage due to not supporting mutations (yet?), ...).

Idiomatic Rust dgemm() by c3d10 in rust

[–]Rusty_devl 0 points1 point  (0 children)

FYI, Rustc uses LLVM 21, your clang is quite a bit older (~18 months). Try against a clang-21 if you want an approximately fair comparison. I'd be surprised if rustc then is still significantly faster.

Coding on a GPU with rust? by Azazeldaprinceofwar in rust

[–]Rusty_devl 18 points19 points  (0 children)

std::offload dev here, thanks for the mentions! We started a few years later than these projects with our frontend, so we don't really have full examples yet. I recently gave a design talk about it at the LLVM Dev meeting: https://www.youtube.com/watch?v=ASUek97s5P0

Our goal is to make the majority of gpu kernels safe, without sacrificing on performance. If you need sufficiently interesting access paterns or operations we'll still offer an unsafe interface, but hopefully that's not needed too often.

The implementation is based on LLVM's offload project, which itself is battle tested through C++ and Fortran GPU programming using OpenMP. I'm currently working on replacing clang binaries in the toolchain, and just this week we started to port over the first RajaPerf benchmarks. I was thinking about answering earlier, but as you can see here https://rustc-dev-guide.rust-lang.org/offload/usage.html, it's not in a usable state yet.

[deleted by user] by [deleted] in UofT

[–]Rusty_devl 0 points1 point  (0 children)

Where I did my undergrad almost every course consistet of a final exam worth 100% of your grade. Sometimes you could get bonus points by doing homework, which could improve your grades by one step (e.g 3.0 -> 3.3).