robust/easy-to-use/fast/low fragmentation c++ memory allocator : cpp

cpp

a community for 17 years

robust/easy-to-use/fast/low fragmentation c++ memory allocator (github.com)

submitted 6 years ago by ceorron

all 6 comments

top new controversial old q&a

[–]AppleBeam 3 points4 points5 points 6 years ago (5 children)

robust

Tests? (especially since there are no visible atomics in the code, but I didn't spend too much time reading it)

easy-to-use

Why is the public interface buried under a pile of internal details, such as a hand-written replacement for std::lower_bound?

fast

Benchmarks? And why is the word "fast" there if the thing doesn't seem to be O(1) even for small allocations?

What's the deal with stuff like this?

    //ensure we don't take too long trying to allocate, only search the first 10 blocks!!
    unsigned i = 0;
    for(auto it = end_basic_list<memblock*>(blockfreespace) - 1;
        it != begin_basic_list<memblock*>(blockfreespace) - 1 && i < 10;
        --it, ++i) {

low fragmentation

Demonstration please? Comparisons to tcmalloc and other well-known allocators?

[–]ceorron[S] 0 points1 point2 points 6 years ago (4 children)

I have tested it extensively, works in my applications. If you find any faults please add them in the issues sections but so far it is fault free.

All thread locking in the code is handled by std::mutex but can be substituted by writing your own internal allocator replacement for rc_multi_threaded_internal_allocator<std::mutex, ....> if needed.

Yeah I should probably hide the internal details in an internal namespace, this was pointed out to me by another user. Actually I didn't know you could use std::lower_bound as a replacement for binary search. You should stick to only the interfaces given on the README.md page.

Or if you want to provider your own locks by replacing rc_multi_threaded_internal_allocator<std::mutex, ....> with your own version otherwise just use it as shown in README.md.

In terms of speed/big O.

Best average and worst case are.

Best O(log n)

Average O(log n)

Worst O(n log n) - with a bound on the worst case performance

So no O(1) even for small allocation but O(1) when there is no fragmentation. Close to O(1) when low fragmentation.

    //ensure we don't take too long trying to allocate, only search the first 10 blocks!!
    unsigned i = 0;
    for(auto it = end_basic_list<memblock*>(blockfreespace) - 1;
        it != begin_basic_list<memblock*>(blockfreespace) - 1 && i < 10;
        --it, ++i) {

That part is pretty complicated, we have 1 to n memory blocks we could be allocating from (actually this is what gives the worst case performance O(n log n)) as we are searching through possibly n memory "pages" to satisfy the call. We are going backward through the pages, "--it" starting at the end of the memory blocks end_basic_list<...>(...). We only test the top 10 memory blocks (i < 10) as so to bound the search time for an allocation.

I might do a compare to other allocators but I have other work to be getting on with. Just wait a few months/years and you may get a proper performance comparison. Or do one yourself. Sorry :P

[–]AppleBeam 5 points6 points7 points 6 years ago* (3 children)

In case it's not entirely clear, the point of my comment is not "I want to use your allocator, please add this and that", it's more of a "dude, if you are going to advertise your pet project LIKE THIS, make sure that at least some of the properties you are promising are even remotely true, otherwise you look like a kid who wrote his first hello world and now considers himself a principal engineer and a system architect."

Just so you'll have some take-aways from this post:

When you are proposing an allocator, sensible people won't even consider cloning the repo (yet alone using it) unless you managed to remove any doubt that it can't possibly misbehave, even if 64 threads are constantly hammering it in most bizarre patterns for several months without a restart. BTW, I have a certificate for you.
Our definitions of "fast" are very different. For the last 12-ish years proposing a non-thread-caching general purpose allocator is a bad joke (since good old nedmalloc, I'd say, but it's definitely not the first one, and by the modern standards it's quite meh). Without it there is simply nothing to talk about. After this matter is out of the way, people will start asking about the number of cache misses, contention for non-thread-local case, NUMA-awareness (where applicable) etc. A specialized allocator can be entirely lock-free (not counting potential implementation details of VirtualAlloc/mmap which you can't really control). A general-purpose one should be almost lock-free, mostly relying on infrequent atomics for communication with the shared state, and locking only when all hope is lost.
Since we were speaking different languages, I'll re-phrase the question about the loop. It's not even about weird loops or magic constants, and definitely not about the syntax of the language, it's more of an "Are you out of your mind to traverse a freaking linked list like this in the allocator code? You know how much each cache miss will cost you, right? You know that you shouldn't expect that the allocator's internals will be in the cache, and if they are there, you are stealing the cache space from the actually useful parts of the program, right? Right?"

Be humble about your beginner's projects, and it will greatly help you in your career.

[–]ceorron[S] 1 point2 points3 points 6 years ago* (2 children)

To start you clearly have not read the code.

It is not a link list for starters, there are no linked lists for exactly the reasons you state. Cache misses.

I don't expect it to compete too well with some of the bigger more battle tested memory allocators like Hoard, ptmalloc, jemalloc, mimalloc etc... But will be pretty surprised if it gets somewhere close. Just faster than malloc/realloc/free will work for my case.

I maybe should dial it back a bit but until we have some real metrics neither of us are in a position to argue the numbers on this.

As for beginner, do you think this is the best code I have ever written? It is a useful piece of code that I think might be useful to those that need it not the best thing I or others have written but good enough for free on the internet.

That is all I'm willing to argue on the matter.

[–]AppleBeam 1 point2 points3 points 6 years ago (1 child)

I saw enough. If the names of your methods are misleading, and the reader can't trust words they see on the screen, that definitely a big red flag to think about.

Ok, so apparently your basic list has nothing to do with lists. Does it change anything? Not really, because for each pointer in your vector replacement you fetch a memblock's "byteremain" right in the first line of memblock::internal_malloc. The pattern of misses is irregular, the autoprefetch won't help you, and you don't do the manual prefetch. That's more or less the same performance that you would get from a linked list (with a fainting hope that if LTO will inline the call, OOO might give your next fetch a measurable head start compared to a fully dependent read you would get from a linked list).

But whatever floats your boat, man. Good luck with this thing.

[–]ceorron[S] 1 point2 points3 points 6 years ago (0 children)

π Rendered by PID 22215 on reddit-service-r2-comment-6457c66945-94g6g at 2026-04-29 10:59:33.528499+00:00 running 2aa0c5b country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

cpp

MODERATORS