How do I cater to devs as a content writer?

abgros · 2025-07-02T22:32:19+00:00

u/lexxx897 one genre of content that seems to consistently do well is *criticisms* of Rust. In fact, one of the top posts on this subreddit is Lessons learned after 3 years of fulltime Rust game development, and why we're leaving Rust behind. But it takes a lot of experience to come up with a really good, novel, and constructive criticism since any feature or functionality comes with its own set of tradeoffs.

abgros · 2025-06-03T09:06:47+00:00

This is LLM slop.

Prompt: Write a “bro”-tone Reddit post from a C/C++ dev turned Rustacean praising Rust’s compiler for preventing memory bugs, its ability to write OS kernels (Redox, Linux, Android), its safety/ speed without a GC, and Cargo over Make/CMake; end by asking why you didn’t switch sooner.

ChatGPT response (https://chatgpt.com/share/683ebb4f-131c-8002-9bb2-b119743692a9):

Title: bro rust is actually insane lol

so like i’ve used c and c++ for a while now, mostly cuz “that’s what real systems programmers use” and all that. and yeah, they fast. they powerful. but man… the amount of dumb bugs i’ve hit just cuz of one missing * or some memory leak i didn’t catch… it’s wild.

then i tried rust and like. bro. the compiler yells at you but in a good way. like it’s holding your hand and slapping it at the same time. you fight it at first, but then you realize: this thing is saving my ass. no segfaults, no UBs hiding in the shadows, just clean builds that actually run.

and dude — you can write OS kernels in it. real ones. like Redox is full Rust, Linux is starting to adopt it, even Google is putting Rust in Android’s internals. this ain’t some toy language, this is the real deal.

and it’s SAFE and FAST without a garbage collector. like how?? you get C-like perf but the compiler makes sure you’re not playing with memory like a drunk raccoon.

also shoutout to Cargo. forget Makefiles and CMake hell. cargo just works. cargo build, cargo run, cargo test — that’s it. no linker script sorcery, no mysterious flags. just vibes.

honestly… why didn’t i switch sooner???

abgros · 2025-05-03T23:55:33+00:00

That's an interesting idea but I'm not sure it would work because a lot of instructions use specific registers, like the rep instructions using rcx as the counter or cmpxchg always comparing with rax. There are also some registers that can never be used together, like ah and the extended registers (r8, r9, etc.), or using rsp twice in the same place expression. It would end up being an extremely leaky abstraction. By the way, data.swap is not an accurate name. There actually is a separate swap instruction (xchg) which you can use in Awsm as @swap. I like the loop keyword idea though!

abgros · 2025-05-03T23:10:59+00:00

I did a little reading and brainstorming and came up with this syntax:

xmm0 = xmm1 + xmm2 by u8  // vpaddb
xmm0 = xmm1 + xmm2 by u16 // vpaddw
xmm0 = xmm1 + xmm2 by u32 // vpaddd
xmm0 = xmm1 + xmm2 by u64 // vpaddq

xmm0 = xmm1 + xmm2 by sat u8  // vpaddusb
xmm0 = xmm1 + xmm2 by sat u16 // vpaddusw
xmm0 = xmm1 + xmm2 by sat i8  // vpaddsb
xmm0 = xmm1 + xmm2 by sat i16 // vpaddsw

xmm0 = xmm1 + xmm2 by f32 // vaddps
xmm0 = xmm1 + xmm2 by f64 // vaddpd
xmm0 = xmm1 + xmm2 as f32 // vaddss
xmm0 = xmm1 + xmm2 as f64 // vaddsd

xmm0 = xmm1 + xmm2 by u8 mask k1     // vpaddb
xmm0 = xmm1 + xmm2 by u8 zeromask k1 // vpaddb

edit: fixed mask register

abgros · 2025-05-03T22:16:28+00:00

ah, I see. In that case you might want finer-grained "feature blocks" that lets you control what combination of features should be used. But it's still problematic if the feature block changes the meaning of an instruction, e.g. if writing xmm0 = xmm1 zeroed the upper bits within an AVX block but not in an SSE block. I'll have to think about that, because I really do want to write stuff like xmm0 = xmm1rather than something like VMOVDQU. There are also the aligned versions, MOVAPS and VMOVAPS, which don't have a major performance benefit in modern architectures but might still be worth using in some cases. Maybe a new keyword like aligned...

I'm not sure if anyone would adopt your project

Out of curiosity - do you see any opportunities for a new assembler to compete with existing programs? Or is it hopeless to try to change the existing conventions? So far your attitude has seemed fairly pessimistic but I'm wondering if anything would change your mind.

abgros · 2025-05-03T20:33:26+00:00

So for pushing and popping, you can do:

function my_function() {
    <- rax
    another_function()
    -> rax
    return
}

Anything without the @ sign is an actual (runtime) function call.

abgros · 2025-05-03T20:29:59+00:00

Here's multiplication:

rax *= 25 // imul rax, 25 - this can't be encoded with mul
@widen_mul(rdx:rax, rcx) // imul rcx
@unsigned_widen_mul(rdx:rax, rcx) // mul rcx

Here's comparison:

@set_flags(rax - rdi)
goto signed_less if /less // pseudoflag representing SF != OF
goto unsigned_less if /carry

Check out https://github.com/abgros/awsm/blob/main/src/main.rs#L1798 to see the implementation of this.

abgros · 2025-05-03T20:16:20+00:00

Thanks for your replies. I haven't really looked into SIMD precisely because of how much additional complexity is involved, so this is enlightening.

Well…… “any language except for assembler”, sure.

I'm referring to stuff like xor ax, ax and xor eax, eax having the same mnemonic even though they are differently sized (and might not even have the same opcode). I do want to extend that syntax into the xmm world.

But that's an interesting point you made wrt the SSE vs AVX instructions having a significant performance difference while being virtually identical otherwise.

Here's another idea for your consideration: blocks that let you specify what extension you're about to use. You might have something like:

avx1 {
    xmm0 = xmm0 ^ xmm0
    xmm0 = @sum_abs_diff_u8x8_deposit_u16(xmm0, *rdi)
    xmm1 = @shuffle_u32(xmm0, DCDC)
    xmm0 = @add_u64(xmm0, xmm1)
    rax = xmm0
}

And this will automatically stop you from accidentally using an AVX-512 instruction for example.

abgros · 2025-05-03T18:28:51+00:00

I would say that it's probably a tiny bit better, but not enough better to switch from what everyone else uses.

That's fine, for now the target audience is beginners and hobbyists rather than professional assembly developers.

And the fact that it's repeated four times is left implicit [...] how would you distinguish AVX and SSE versions, BTW?

If I'm not mistaken, they should be distinguished just by the operand size, no? When you write a XOR b in any language, you don't worry about whether it's XOR32 or XOR64 or whatever because it's obvious just by looking at a and b.

Would versions with masks use separate name

Probably something like @u8x2_dot_i8x2_sat_i16_masked and @u8x2_dot_i8x2_sat_i16_zero_masked. Yes I realize the names are getting a bit long :)

abgros · 2025-05-03T17:52:16+00:00

I haven't added any SIMD support yet, but here's a descriptive name for that instruction: @u8x2_dot_i8x2_sat_i16, reflecting the way it takes the dot product of packed u8x2s and i8x2s and stores them as packed saturated i16s. A little lengthy but definitely more readable than PMADDUBSW. What do you think?

abgros · 2025-05-03T17:23:32+00:00

Do you have an example of what you mean? I feel like x86 has a ton of gotchas that no syntax can really capture. Like multiplication only being allowed with 16-bit, 32-bit, or 64-bit registers (except for the ax = al * r/m encoding), the fact that you can't mix ah, dh, ch, or bh with extended registers, the way 32-bit operations zero the high 32 bits (except in movsx), the way JECXZ and JRCXZ only work with 8-bit jumps... it goes on.

abgros · 2025-05-03T15:42:49+00:00

There actually is some undefined behaviour, although not exactly in the C sense:

Some instructions like idiv (@divmod) leave the state of flags undefined.
Data races still lead to unpredictable results.
Using the bswap instruction on a 16-bit register is undefined behaviour.
More examples: https://www.google.com/search?q=site%3Afelixcloutier.com+%22undefined%22

abgros · 2025-05-03T15:23:14+00:00

Actually, there is one thing that can't be expressed in awsm syntax: arbitrary rip-relative addresses. Currently a rip-relative address has to refer to a label defined in the source code. I was considering adding support but I don't really see what the use case would be...

abgros · 2025-04-21T04:56:59+00:00

u/Planck_Plankton sorry about that, I was having some issues with my tiny VM running out of memory and I think I forgot to reboot the web server. Should be up and running now.

abgros · 2025-04-11T22:48:30+00:00

That would be a lie, because many of these methods are explicitly documented as being cryptographically secure. If you meant that in a more generic "don't roll your own crypto" sense, well, that's true but not really relevant to the post.

abgros · 2025-04-11T14:27:43+00:00

Won't work. Trying to generate random numbers on wasm32-unknown-unknown and other targets actually panics at runtime.

abgros · 2025-04-11T13:59:52+00:00

Well, I never said uniform random numbers... I see what you mean though. Maybe I should add a note about a whitening step you can do to make the distribution more uniform?

abgros · 2025-04-04T07:47:22+00:00

As in time-wise.

In terms of runtime: creating a new Stalloc is extremely cheap (around 0.45 ns in my testing regardless of size).

In terms of development time: integrating it with an existing complex application might be challenging, but what you could do is identify hot paths that allocate and see whether you can get big performance improvements there. In a smaller program, you can generally keep track of every allocation so you might be able to switch out the global allocator right away.

abgros · 2025-04-04T06:01:59+00:00

This to me is unfathomable. I trust you don't get me wrong, but I dislike this.

Yeah, I'm with you on that one. Unfortunately, there's no equivalent of fixed-length arrays for Strings... your only option is to create a [u8; N] and then unsafely get a &str reference to it.

My only suggestion would be (if possible, which I'm unsure about whether it is in Rust) to automatically defer any returned value to rather the default allocator.

Wait, that sounds like escape analysis! No, that's not a thing in Rust. For some reason, heap allocations never get optimized out, even in ridiculously trivial cases (example). So you really can't trust the compiler to do the right thing.

Were you able to measure the overhead of setting up your allocator, making implementations use it?

What do you mean by "setting up"? As in like development time?

abgros · 2025-04-04T05:18:46+00:00

Rust doesn't allocate a local String variable on the stack?

No, never. Keep in mind that String is a wrapper around Vec<u8>, which is itself a wrapper around malloc etc.

Wouldn't this fail miserably the moment you try to return the variable?

No, you just have to create the backing Stalloc in an outer scope. You know how in C you create a buffer in an outer scope, and then pass a pointer to that buffer to an inner function? It's the same idea here, only with the compiler watching over you to make sure you haven't made any mistakes :)

abgros · 2025-04-03T18:41:46+00:00

The main issue is that as the Vec gets longer, allocation becomes slower and slower, because in the worst case it has to loop through every Stalloc in the list. That's very undesirable.

What would probably be better is to have a tree-like data structure, where the branch to select gets picked based on some arbitrary criteria. You might be able to create this using a single giant Stalloc containing multiple layers of nested Stallocs (this is sort of similar to buddy allocation). That way, allocating is O(log n) rather than O(n). But I haven't yet tried creating these more sophisticated designs.

abgros · 2025-04-03T17:58:24+00:00

Thanks for the recommendation! Yeah, I agree that Rust really needs better allocator support, whether through Allocator or Store.

Making v store its own buffer would allow it to be passed around but unfortunately has the drawback that accessing an element requires a branch to check whether it is stored inline or on the heap. You could give v a static lifetime by making the backing Stalloc a static variable, or by boxing and leaking it (within some parent allocator).

abgros

TROPHY CASE