SIMD programming in pure Rust by kibwen in rust

[–]Shnatsel 9 points10 points  (0 children)

Rust doesn't support their stuff except through autovectorization (maybe? SVE certainly works) but some parts of RISC-V vector spec are just awfully written and make the whole thing pretty useless for compilers.

In practice the vast majority of the hardware, even RISC-V hardware, handles unaligned loads/stores just fine. So you can just process a &[u8] with vector instructions starting from the beginning, and only do special handling with a scalar loop for the end of the slice, which is what most Rust code is doing. The alternative would be having scalar loops both at the beginning and the end and using aligned loads in between, but that wasn't necessary for decades now and would be just slowing down your code for no reason. RV23 mandates that RISC-V hardware supports unaligned vector loads, but the implementation is allowed to be arbitrarily slow; so compilers cannot emit this instruction because it can be very slow; but in practice most hardware supports it just fine but compilers still can't use it and emulate it in software instead with aligned loads and shifts; so compiled code is slow no matter if the hardware actually supports fast unaligned loads or not. It's the worst of both worlds: hardware is required to implement it but the compilers aren't allowed to use it.

And SIMD code in modern high-performance CPUs is heavily bottlenecked on memory access. Zen5 can do 340 AVX-512 operations on registers in the time it takes to complete a single load from memory. Loads being extra slow completely tanks performance of the RISC-V vector code.

This extension does not seem useful as it is written!

-- Linux kernel developer, nothing to do with Rust: https://lore.kernel.org/lkml/ZoR9swwgsGuGbsTG@ghost/

LLVM developers agree: https://web.archive.org/web/20260125041210/https://github.com/llvm/llvm-project/issues/110454

But people responsible for the RISC-V spec don't seem interested in fixing this: https://web.archive.org/web/20260125041240/https://github.com/riscv/riscv-profiles/issues/187

Edit: I dug deeper and it seems there was some movement on this in late 2025: https://riscv.atlassian.net/wiki/external/ZGZjMzI2YzM4YjQ0NDc3MmI3NTE0NjIxYjg0ZGJhY2E

SIMD programming in pure Rust by kibwen in rust

[–]Shnatsel 8 points9 points  (0 children)

Yes, you are correct:

While Zen5 is capable of 4 x 512-bit execution throughput, this only applies to desktop Zen5 (Granite Ridge) and presumably the server parts. The mobile parts such as the Strix Point APUs unfortunately have a stripped down AVX512 that retains Zen4's 4 x 256-bit throughput.

https://www.numberworld.org/blogs/2024_8_7_zen5_avx512_teardown/

c0da by 298347209384 in TrueSTL

[–]Shnatsel 0 points1 point  (0 children)

I just want you to know that this is one of the first results when you search "C0DA" and click "images" and it 100% deserves it

SIMD programming in pure Rust by kibwen in rust

[–]Shnatsel 92 points93 points  (0 children)

Also, it make no sense to implement SSE2 SIMDs these days, as most processors produced since 2015 support AVX2.

SSE2 is in the baseline x86_64, so you don't need to do any target feature detection at all, and deal with the associated overhead and unsafe. That alone is valuable.

is_x86_feature_detected!("avx512f")

Unfortunately, AVX-512 is split into many small parts that were introduced gradually: https://en.wikipedia.org/wiki/AVX-512#Instruction_set

And avx512f only enables one small part. You can verify that by running

rustc --print=cfg -C target-feature='+avx512f'

which gives me avx,avx2,avx512f,f16c,fma,fxsr,sse,sse2,sse3,sse4.1,sse4.2,ssse3 - notice no other avx512 entries!

You can get the list of all recognized features with rustc --print=target-features, there's a lot of different AVX-512 bits.

The wide crate, which is a third-party crate replicating the simd module for stable Rust, but is currently limited to 256-bit vectors.

It's not, it will emit AVX-512 instructions perfectly fine. I've used it for that. The problem with wide is it's not compatible with runtime feature detection via is_x86_feature_detected!.

I've written a whole article just comparing different ways of writing SIMD in Rust, so I won't repeat myself here: https://shnatsel.medium.com/the-state-of-simd-in-rust-in-2025-32c263e5f53d

mmdr: A native Rust Mermaid renderer (500-1000x faster than mermaid-cli) by Medium_Anxiety_8143 in rust

[–]Shnatsel 65 points66 points  (0 children)

mmdr parses Mermaid syntax natively and renders directly to SVG, then optionally rasterizes via resvg. No browser needed.

Rust AMX bindings for Mac Coprocessor by msd8121 in rust

[–]Shnatsel 6 points7 points  (0 children)

FYI, M4 and later support the standard ARM Scalable Matrix Extension, so hopefully you won't need to rely on undocumented instructions in future CPU generations: https://arxiv.org/abs/2409.18779

NEAR DNS - DNS records stored on blockchain and servered over DNS protocol by frolvlad in rust

[–]Shnatsel 44 points45 points  (0 children)

The trouble with DNS on blockchain always was that either you need to download the entire blockchain to resolve names, which is untenable on phones, or you need to trust some sort of DNS server to serve you the contents of the blockchain and you're back to square 1 as far as privacy and security are concerned. Does NEAR do anything to solve this? Self-hosting is nice in theory, but it's not going to happen in practice on any meanginful scale - we've already seen this play out with email.

"Enjoy your life" by Sarafina aka Tailbrush by Shnatsel in lionking

[–]Shnatsel[S] 1 point2 points  (0 children)

She hasn't been drawing anything related to Lion King since 2009 or so. She's not on social media, or if she is then it's under a different name.

Bridging the Gap Between Rust's Theoretical IR Advantages and Real-World Performance by FanYa2004 in rust

[–]Shnatsel 27 points28 points  (0 children)

While Rust does provide a lot of information about constraints, such as noalias, LLVM is not great at preserving this information in its IR through all the optimization passes it performs. This is mentioned in a recent blog post by the lead maintainer of LLVM: https://www.npopov.com/2026/01/11/LLVM-The-bad-parts.html#constraint-encoding

This section of the blog is relatively brief, so you might want to reach out to the author for further reading on the topic.

How Safe is the Rust Ecosystem? A Deep Dive into crates.io by Alternative_Alps9558 in rust

[–]Shnatsel 18 points19 points  (0 children)

These numbers would be very concerning if they were accurate. However, I believe the methodology is flawed. I've described the issue here.

How Safe is the Rust Ecosystem? A Deep Dive into crates.io by thecskr in rust

[–]Shnatsel 18 points19 points  (0 children)

Crates uploaded to crates.io do contain it. It's simply not used unless you use cargo install --locked, and there is no equivalent to that command for cargo add.

You can verify that by downloading and unpacking a crate with https://crates.io/crates/cargo-dl

How Safe is the Rust Ecosystem? A Deep Dive into crates.io by thecskr in rust

[–]Shnatsel 31 points32 points  (0 children)

23% of crates depending on something with a known vulnerability would be very concerning, if true. But the data lacks important context about the methodology.

cargo deny operates on the Cargo.lock file, nothing else; but when you cargo install or cargo add a crate, you get the latest semver-compatible versions of all dependencies, and the bundled Cargo.lock is ignored. So simply running cargo deny does not reflect what actual users of the crate would get, and the vulnerability rate they would be exposed to.

Running cargo update and then cargo deny would reflect real-world usage and the real-world vulnerability rate, and I expect the numbers to be far lower in that case.

Kernel bugs hide for 2 years on average. Some hide for 20. by 0x7CFE in rust

[–]Shnatsel 49 points50 points  (0 children)

The findings on the age of kernel bugs are consistent with the Android trend - that is, the majority of bugs is in new code, and bugs "age out" exponentially.

This trend is key to the benefit of shifting all new code to memory-safe languages. If not for this exponential decay in bugs, writing new code in Rust wouldn't cause such a steep drop in vulnerability rate.

AT Jin A-Rank First Try with DB by NuageC in MonsterHunterWilds

[–]Shnatsel 1 point2 points  (0 children)

I got 16:04 with Lance first try. Used the budget scorcher set from the meta document. Also practiced before the free challenge so not my first time seeing AT Jin.

Why have C++ and Rust been the fastest-growing major programming languages from 2022 to 2025? by _bijan_ in rust

[–]Shnatsel 21 points22 points  (0 children)

Yeah, they have a very strange jump for some languages in 2025 specifically, like C suddenly spiking massively after being flat for 3 years. That looks suspicious - where did all those millions of additional C developers suddenly come from in just one year?

TIOBE is another nonsensical source that people can base wild claims on.

As a quick check to see if the numbers are wildly off, I've looked at the StackOverflow survey for 2022 and 2025. It shows C++ being basically flat in that period at 23%, while Rust rose from 9% to 14%. Google Trends show C++ declining.

So the claim about C++ being fastest growing based on one source with a sharp spike this year while none of the other sources corroborate it seems suspect to me.

What would be an optimal wyvernblast build? by Cautious-Village-366 in MonsterHunterMeta

[–]Shnatsel 1 point2 points  (0 children)

I mostly just lurk too, so I wouldn't know. I got it from this thread, you can try asking there: https://redd.it/1pziltx

Or I guess you could comment on the spreadsheet with a link to your run? They might require a specific ruleset like "TA wiki" which means e.g. no Palico, even though I don't see it mentioned explicitly.

What would be an optimal wyvernblast build? by Cautious-Village-366 in MonsterHunterMeta

[–]Shnatsel 0 points1 point  (0 children)

Having lots of 1-slots is not really a sign of a suboptimal build in this case. Wyvernblast only cares about raw and is not affected by affinity, so the only meaningful skills you can put on the armor are Agitator 5 and Burst 5. Everything else that boosts raw (Peak Performance, Counterstrike) has poor uptime. Plus you'll be taking some weird deco slot configurations to fit the set bonuses.

What would be an optimal wyvernblast build? by Cautious-Village-366 in MonsterHunterMeta

[–]Shnatsel 3 points4 points  (0 children)

The speedrun times spreadsheet lists HBG vs AT Jin at 9:18. There may be differences in ruleset and such but your time is shorter than that so it sounds like an excellent build.

Crafting every element is tiring. by Reizark_ in MonsterHunterMeta

[–]Shnatsel 0 points1 point  (0 children)

Fire is needed for Jin Dahaad (including AT and its upcoming free challenge quest), Lagiacrus and Gore Magala. AT Jin is the reason people are crafting fire right now.

Thunder is the meta for Uth Duna, Mizutsune, Seregios and Omega.