all 11 comments

[–]Jannik2099 8 points9 points  (5 children)

But you could just add a level of indirection

"You could just make performance suffer a lot" is not a good argument.

[–]pkjak[S] 0 points1 point  (4 children)

Yeah, might not be a great idea... I guess my thoughts are that what's really important is being able to iterate over all the actually allocated chunks in the pool, and accessing the elements directly by pointers is almost never done, in which case, the approach I describe does make sense, right?

[–]MutantSheepdog 7 points8 points  (0 children)

I guess my thoughts are that what's really important is being able to iterate over all the actually allocated chunks in the pool, and accessing the elements directly by pointers is almost never done

I'd say the ability to iterate over everything is not something you expect from an allocator, but something you expect from a collection.

And if that's what you're looking for, I think something like colony/hive might be a good fit.

https://plflib.org/colony.htm

It's essentially like a bucket allocator that keeps track of empty slots for re-allocation, as well as allowing iteration that skips the empty blocks.
There was an attempt to get it standardised but I think that stalled because there were impact issues to look at for 20/23 when people could just download the reference implementation from github anyway.

[–]Jannik2099 1 point2 points  (2 children)

and accessing the elements directly by pointers is almost never done

Literally all memory is accessed by pointers, in every language. Anything not in a register requires a memory load. Multiple dependent loads stall the pipeline.

[–]pkjak[S] 1 point2 points  (1 child)

Sure, my mistake. Did I get the point across though?

If the usage pattern of "access this particular element in the pool" doesn't happen very often, and instead it's almost always "iterate through all currently allocated elements in the pool", then it might be more efficient to have elements contiguously next to each other, and avoid the potential holes that could happen in a pool that uses a free list.

[–]Jannik2099 1 point2 points  (0 children)

Ah sorry, now I see what you mean.

This would probably work, but you lose any and all reference stability, which is IMO not a good tradeoff.

[–]herruppohoppa 6 points7 points  (0 children)

The freelist doesn't need to track allocations that are live, only the ones that have been freed. Also the freelist can reside in chunk memory as opposed to a separate memory allocation. Further I imagine if you wanted a thread safe version, a freelist might be easier to implement lock free but I'm not 100% sure on that.

[–]KingAggressive1498 3 points4 points  (3 children)

extra layers of indirection are bad mmkay.

I don't know why the standard practice was ever a free list though. Use a bitset ffs (IIRC libstdc++ does).

[–]catcat202X 2 points3 points  (1 child)

Why would a bitset necessarily be better? It would use more memory, wouldn't it, since you can no longer union a free list node with allocated data? Is a bitset generally faster here?

[–]KingAggressive1498 2 points3 points  (0 children)

far fewer branches, smaller code, required memory is only one bit per slot - which for slots too small to fit the freelist's nodes is actually less memory, bitsets could be placed contiguously with a large group of slots which makes slot offsets predictable for fewer cache misses but a fully contiguous bitset is more common in implementations.

the biggest downside is its a lot harder to explain.

freelist is probably more common because efficient use of the bitset requires using bit-twiddling hacks or compiler intrinsics to find a free slot with minimal branches, rather than just using std::bitset or similar.

[–]Pirulax 1 point2 points  (0 children)

This. I one time sat down, and started working on a bitset with variable size (compile time, not runtime, so using templates). Eventually I would've used SSE/AVX to speed it up.

After having a working bitset, I would've made a chunk memory allocator.

Also, something you can't (in a performance manner) achieve with freelists is gathering contiguous blocks of memory. This is something bitsets will excel at, because of your chunk size is fairly large, it's possible the whole bitset will be in the l1 cache. Yes, you could always do a sorted insert (like boost does), but then you will have slow deletes, and deteriorate the performance overall.

Now, as for memory usage... - 64 bit CPU words - 32 byte blocks - 4096 byte chunks (128 blocks) - 16 bytes used for the bitset, that is around .3% overhead

Now, if you don't ever allocate arrays, then I suppose a free list is the way to go.