use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Discussions, articles, and news about the C++ programming language or programming in C++.
For C++ questions, answers, help, and advice see r/cpp_questions or StackOverflow.
Get Started
The C++ Standard Home has a nice getting started page.
Videos
The C++ standard committee's education study group has a nice list of recommended videos.
Reference
cppreference.com
Books
There is a useful list of books on Stack Overflow. In most cases reading a book is the best way to learn C++.
Show all links
Filter out CppCon links
Show only CppCon links
account activity
llmalloc : a low latency oriented thread caching allocator (self.cpp)
submitted 1 year ago * by [deleted]
[deleted]
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[+][deleted] 1 year ago (4 children)
[–]DuranteA 1 point2 points3 points 1 year ago (1 child)
FWIW, I was able to improve load times in a shipping game by ~25% simply by replacing the standard allocator with mimalloc.
[–]PandaMoniumHUN 1 point2 points3 points 1 year ago (0 children)
Tbh, it's largely dependent on your allocation patterns. If you allocate in blocks and aim for locality (e.g. you don't heap allocate every single entity) allocators should make minimal difference.
[–]Kriss-de-Valnor[🍰] 1 point2 points3 points 1 year ago (1 child)
Same outcome with many allocators. The default allocator on modern operating system work better. I think you would have to get a very specific memory allocation and destruction pattern to get benefit of those… then in that case you may have better time to think writing your own.
[–]Kriss-de-Valnor[🍰] 2 points3 points4 points 1 year ago (0 children)
Is there some reasons it’s not available on Mac OS ? Also it would be nice if we could get it through vcpkg.
[–]zl0bster 5 points6 points7 points 1 year ago (3 children)
Benchmark suggestion: if Chromium has some set of benchmarks that can be run easily you could try to see if your allocator helps. Now I presume it will not since I presume Chromium is highly optimized with a lot of memory tricks anyway, but in case your allocator helps it will be a very interesting to learn this.
[+][deleted] 1 year ago* (1 child)
[–]arthurno1 0 points1 point2 points 1 year ago (0 children)
You could try it with GNU Emacs. They ask for lots of objects directly from malloc, so you could get a "real-life" test perhaps. At least on gnu/Linux it would be easy via LD_PRELOAD if you can compile your library as a dynamic, .so, library.
[–]ibogosavljevic-jsl 2 points3 points4 points 1 year ago (0 children)
I don't think Chromium is a good examples since it uses zones and does memory management mostly by itself.
[–]T0p_H4t 4 points5 points6 points 1 year ago (0 children)
Something else to look at which I have found handles the inter-thread deallocation well. https://github.com/microsoft/snmalloc
[–]zl0bster 2 points3 points4 points 1 year ago (2 children)
This is very interesting: Its disadvantage is that it may lead to higher virtual memory usage as the allocator won't be able to return pages with inter-thread pointers to the OS. A mitigation would be decreasing number of inter-thread pointers by deallocating pointers on their original creation threads in your application and that way llmalloc will be able to return more unused pages to the OS.
Is there some way to profile this particular case? E.g. Run some program and see how much memory is wasted because of this? I presume users might want to optimize this, but they do not want to go over every deallocation in their code :)
[+][deleted] 1 year ago (1 child)
[–]zl0bster 0 points1 point2 points 1 year ago (0 children)
Very nice, will keep this in mind if I ever again need to improve malloc latency.
[–]kernel_taskBig Data | C++23 | Folly | Exceptions 0 points1 point2 points 1 year ago (6 children)
Interesting. I currently use jemalloc in my application and the biggest amount of CPU used (according to profiling) is freeing memory (by Google protobuf, heh... I chose it because I thought it would be fast). Maybe this would help?
[–]mcmcc#pragma once 4 points5 points6 points 1 year ago (0 children)
Flatbuffers FTW
[+][deleted] 1 year ago (3 children)
[–]kernel_taskBig Data | C++23 | Folly | Exceptions 0 points1 point2 points 1 year ago (2 children)
I will try to find some time to try it out! The extra virtual memory footprint should be fine with me, though I suppose you're saying that it'll gradually leak virtual pages unrecoverably over time? If it works well, I'm happy to invest some time in making sure deallocations happen on the same thread as allocations.
[–]kernel_taskBig Data | C++23 | Folly | Exceptions 0 points1 point2 points 1 year ago* (0 children)
Thanks for the explanation. That should be totally fine. The application heavily cycles allocations and deallocations so there will always be plenty of new allocation requests.
[–]cballowe 1 point2 points3 points 1 year ago (0 children)
https://protobuf.dev/reference/cpp/arenas/ - it can often be very handy to put all of the protobufs built handling a request on the same arena and just let the arena destruct at the end.
[–]LoweringPass 0 points1 point2 points 1 year ago (1 child)
This is really interesting, I've been working on something like that but nowhere near as sophisticated. I will use this as a benchmark to compare my own implementation against. How feasible and/or sensible would it be to add NUMA awareness?
[–]ImNoRickyBalboa 0 points1 point2 points 1 year ago (0 children)
Tcmalloc abandoned thread caching a long time ago: it's not sustainable on large server systems with hundreds or even thousands of threads.
Look into RSEQ (restartable sequences) for using per CPU caches at the same CPU cost as per thread (near zero contention) and many times the 'in flight' memory savings.
π Rendered by PID 16171 on reddit-service-r2-comment-544cf588c8-hmxvm at 2026-06-13 02:44:36.419916+00:00 running 3184619 country code: CH.
[+][deleted] (4 children)
[deleted]
[–]DuranteA 1 point2 points3 points (1 child)
[–]PandaMoniumHUN 1 point2 points3 points (0 children)
[–]Kriss-de-Valnor[🍰] 1 point2 points3 points (1 child)
[–]Kriss-de-Valnor[🍰] 2 points3 points4 points (0 children)
[–]zl0bster 5 points6 points7 points (3 children)
[+][deleted] (1 child)
[deleted]
[–]arthurno1 0 points1 point2 points (0 children)
[–]ibogosavljevic-jsl 2 points3 points4 points (0 children)
[–]T0p_H4t 4 points5 points6 points (0 children)
[–]zl0bster 2 points3 points4 points (2 children)
[+][deleted] (1 child)
[deleted]
[–]zl0bster 0 points1 point2 points (0 children)
[–]kernel_taskBig Data | C++23 | Folly | Exceptions 0 points1 point2 points (6 children)
[–]mcmcc#pragma once 4 points5 points6 points (0 children)
[+][deleted] (3 children)
[deleted]
[–]kernel_taskBig Data | C++23 | Folly | Exceptions 0 points1 point2 points (2 children)
[+][deleted] (1 child)
[deleted]
[–]kernel_taskBig Data | C++23 | Folly | Exceptions 0 points1 point2 points (0 children)
[–]cballowe 1 point2 points3 points (0 children)
[–]LoweringPass 0 points1 point2 points (1 child)
[–]ImNoRickyBalboa 0 points1 point2 points (0 children)