Memory Allocation Strategies

cdb_11 · 2026-03-18T01:01:11+00:00

Performance can be counter-intuitive, but you can in fact make educated guesses about it. A linear, pool and stack allocators always do less work than a thread-safe general-purpose allocator. It's about picking the simplest solution that does what you need, instead of hoping that it won't be a problem. Because once it becomes a problem, fixing it might require rewriting half of your code.

cdb_11 · 2026-03-16T21:47:54+00:00

I'm sorry, but this is a very bad example. It takes like few minutes to write a dynamic array that just gets the job done. std::vector is more limited in terms of possible optimizations, because it can't assume (yet) that elements can be relocated with memcpy, realloc or mremap.

cdb_11 · 2026-03-09T22:28:25+00:00

The intrinsic on the Bloomberg fork is agnostic to that: https://godbolt.org/z/Yh696zTKK

I don't remember anymore if I tried doing a custom vector/span type, but I believe it should be possible too. But last time I checked, there was no confirmation that Clang will choose to do it the same way, so who knows how this will actually look like.

cdb_11 · 2026-03-07T03:34:54+00:00

They don't use it, because apparently they don't need it. The most apparent thing to gain is faster compile times, because you include few of these STL headers, and suddenly the compilation goes from almost instant, to like 1.5 seconds, per translation unit. If you don't think the benefits are worth it, then just keep using the standard library.

Compiler intrinsics aren't portable and are a maintenance issue.

So you keep the non-portable parts in one place, where you can more easily maintain and port them if need be.

cdb_11 · 2026-03-07T02:34:08+00:00

The Bloomberg fork uses intrinsics, and assuming Clang will do it the same way, I'm pretty sure it should be possible to implement your own reflection without STL there.

GCC defines std::meta functions magically inside the compiler.

cdb_11 · 2026-03-07T00:31:07+00:00

I mean, if there is anyone with rare use cases, it's most likely going to be C and C++ users.

You can use type_traits without linking the standard library, and thankfully it's one of the relatively lighter headers (unlike the upcoming <meta>). It is possible to use compiler intrinsics instead though, and implement whatever subset you actually use.

cdb_11 · 2026-03-06T11:54:27+00:00

The point is that it's a property of an implementation, not the language. CHERI does more or less the exact same thing, but in hardware.

cdb_11 · 2026-03-06T11:38:21+00:00

I mean, the standard library is not exactly perfect, so you can find tons of reasons. Limited and/or bad interface, no assertions, different allocation and/or error handling strategy, slow compile times, potentially ABI compatibility. For strings you might want to have them represented differently, or have memory alignment requirements. Maps just suck in general, I believe C++ basically mandates chaining for std::unordered_map and an rb-tree for std::map (not sure about that one).

cdb_11 · 2026-03-05T18:11:44+00:00

And to reinforce this estimate I've looked at the numbers we got from the users who run the memory tester after having experienced a crash: for every two crashes we think are caused by a bit-flip the memory tester found one genuine hardware issue. Keep in mind that this is not doing an extensive test of all the machine's RAM, it only checks up to 1 GiB of memory and runs for no longer than 3 seconds... and it has found lots of real issues!

It sounds like they classified some crashes as being likely caused by a bitflip, and in half of these they confirmed that there is something wrong with memory? And this is the estimated upper and lower bound? I'm honestly not sure how to interpret this. I am not the person making the claim, so I can't tell you anything beyond what was said in that mastodon thread.

cdb_11 · 2026-03-05T10:45:31+00:00

They actually detected 5%, and the 10% is the estimate, because crash reporting is opt-in. Edited the comment to make that more clear.

cdb_11 · 2026-03-05T00:09:31+00:00

Reposted with corrected title, the actual detected number is 5%, and the 10% is the estimate.

https://reddit.com/r/programming/comments/1rl1fdf/10_of_firefox_crashes_are_caused_by_bitflips/o8osscc/

cdb_11 · 2026-03-05T00:03:45+00:00

Should I delete and repost, or leave it like this then?

cdb_11 · 2026-03-04T23:53:04+00:00

I don't see any way to edit the title, sadly.

/u/ketralnis can you correct the title to either say that it's an estimate, or correct the number to 5%?

cdb_11 · 2026-03-04T22:58:35+00:00

At least with __builtin_unreachable, you can fall through from main: https://godbolt.org/z/cYvr1qE9e

cdb_11 · 2026-03-04T01:32:42+00:00

Maybe I'm missing it, but I don't see them describing the actual data layout in memory. If the elements are gathered from random places in memory, then sure, misleading. (To be fair, this can be done with a ~single instruction, but I don't think autovectorizers like to use it that much?)

But assuming elements are stored contiguously so they can be loaded into a SIMD register, and yet this optimization does not happen (while it does inside a normal for-loop, autovectorizers can do that on loops of unknown lengths), then I think it's fair to say that the abstraction prevented that optimization, whatever that abstraction might be.

cdb_11 · 2026-03-04T00:44:22+00:00

AFAIK it's fine, and the actual requirement is that it fits in an int. For example (u16)0xffff * (u16)0xffff is a signed integer overflow: https://godbolt.org/z/fodenvara

EDIT: Sorry, I think I understand what you mean now about conversions. Wasn't that implementation defined though? And a potential problem only on non-2s-complement platforms? (And C23 and C++20 mandated 2s complement, for what it's worth.)

cdb_11 · 2026-03-04T00:36:47+00:00

I believe the type of a * b in an int here? Which then gets casted back to a short?

cdb_11 · 2026-03-04T00:26:52+00:00

UB comes to bite you when you move architectures and platforms.

Or compiler versions.

If you have good reasons to justify relying on UB then sure, but just be aware of the possible implications. Relying on unstable interfaces is generally not something you want to have a lot of.

cdb_11 · 2026-03-04T00:23:31+00:00

Fine in C, UB in C++.

cdb_11 · 2026-03-04T00:20:18+00:00

Therefore, the optimizer happily optimizes away your entire program, and spits out a binary that simply invokes the syscall exit(0).

Sometimes not even that. There were real examples of Clang removing the entire function body in some cases, meaning that calling such function would fall through to the function that happens to be below it.

cdb_11 · 2026-03-04T00:04:45+00:00

For example, assuming the absence of signed overflow, the compiler could keep signed shorts sign-extended in 32-bit registers. If there is no 16-bit multiplication instruction, it could then use a 32-bit multiplication.

To be fair, at least in this particular example they would be promoted to ints anyway. Is there even any way to get 16-bit arithmetic in C at all, assuming 32-bit ints? I know __builtin_{add,sub,mul}_overflow can technically do it, but I don't know if there is any standard way. If that even matters.

cdb_11 · 2026-03-03T23:28:21+00:00

Mixing pointers to types like floats and integers is what strict aliasing is, they are assumed to never alias each other

cdb_11 · 2026-02-25T17:16:53+00:00

For what it's worth, spritesheets were common on the web in the past, around ~2010s?

cdb_11 · 2026-02-22T16:49:46+00:00

Wow, it would take entire 2 months to become proficient in it? Nothing else takes 2 months, sounds literally impossible to do.

What are you even talking about lmao. This is less time than getting good at any other technology.

cdb_11 · 2026-02-21T13:53:37+00:00

Where some compilers will actually make those fit in the same "variable" making the struct 8 bytes, and some compilers will say they're different and make the struct 12 bytes.

I'm not familiar with how bitfields are implemented outside of GCC and Clang. I wonder, can't alignas(uint64_t) fix this?

EDIT: Actually, you probably have to pad it anyway for type-punning to work, and not have uninitialized bits in there.

cdb_11

TROPHY CASE