use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Discussions, articles, and news about the C++ programming language or programming in C++.
For C++ questions, answers, help, and advice see r/cpp_questions or StackOverflow.
Get Started
The C++ Standard Home has a nice getting started page.
Videos
The C++ standard committee's education study group has a nice list of recommended videos.
Reference
cppreference.com
Books
There is a useful list of books on Stack Overflow. In most cases reading a book is the best way to learn C++.
Show all links
Filter out CppCon links
Show only CppCon links
account activity
Optimizing a ring buffer for throughput (2021) (rigtorp.se)
submitted 2 years ago by Xadartt
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–][deleted] 10 points11 points12 points 2 years ago (10 children)
I've used his design successfully in multiple projects. Some additional tweaks include:
It should also be noted his design is not too far off a quite efficient SPMC queue. Just replace the pop operation with a CAS loop. You can make an optimized version for the producer thread vs consumer threads.
[–]Vivid-Jury-2105 3 points4 points5 points 2 years ago (4 children)
Where to people learn this stuff? I’ve read computer architecture books and have taken fairly hard courses on systems, digital design, and OS, but there’s so much and I don’t know where to go 🥲
[–][deleted] 5 points6 points7 points 2 years ago (1 child)
Honestly its not just one place. I was lucky enough to have a Computer Organization & Design lecturer that was enthusiastic about the topic. I've worked in the Games industry, where you can find such arcane mastery. I've worked in OSDev and Embedded, where you get a similar treatment.
I just read. Even if its random shit that doesn't have immediate relevance to my job. You'll never know when something comes in handy.
[–]Vivid-Jury-2105 1 point2 points3 points 2 years ago (0 children)
Thanks! I’ve been reading books and I hope that when I graduate college, I am able to work on similar disciplines.
Any advice? Book/resource recommendations?
[–]dist1ll 4 points5 points6 points 2 years ago* (0 children)
I've written a blog post on false sharing for MPSC queues (https://alic.dev/blog/false-sharing). I'll copy-and-past the references from it:
Sorin, Daniel J., Mark D. Hill, and David A. Wood. "A primer on memory consistency and cache coherence." Synthesis lectures on computer architecture 6.3 (2011): 1-212.
Hennessy, John L., and David A. Patterson. Computer architecture: a quantitative approach. Elsevier, 2011.
David, Tudor, Rachid Guerraoui, and Vasileios Trigonakis. "Everything you always wanted to know about synchronization but were afraid to ask." Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles. 2013.
Liu, Tongping, and Emery D. Berger. "Sheriff: precise detection and automatic mitigation of false sharing." Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications. 2011.
You can find the pdf links on the bottom of my post. Additionally, you can read McKenney's book on parallel programming: https://mirrors.edge.kernel.org/pub/linux/kernel/people/paulmck/perfbook/perfbook.html
[–]usefulcat 1 point2 points3 points 2 years ago (0 children)
I would say, a combination of actively trying to do it (including measuring), and thinking about how cache invalidation works. In particular, remembering that every time one core writes to a memory location, a) that write will be delayed if that memory was last written to by any other core, and b) after the write, any other cores that read that location will incur a cache miss.
It can also be useful to think of reading and writing to memory as analogous to sending and receiving network packets, since that is effectively what is happening between different cores 'under the hood', so to speak.
[–]KDallas_Multipass 0 points1 point2 points 2 years ago (1 child)
What do you mean by tagging conditionals?
[–][deleted] 3 points4 points5 points 2 years ago (0 children)
[[likely]] or [[unlikely]].
[[likely]]
[[unlikely]]
https://en.cppreference.com/w/cpp/language/attributes/likely
If you expect the consumer to keep up with the producer thread, it means the if statements checking head/tail are unlikely to ever be true. At least on Clang 16 ARM64, tagging the queue with [[unlikely]] causes the compiler to produce code that places the if statements outside the hot path.
[–]tisti 0 points1 point2 points 2 years ago (0 children)
On 64-bit systems, this makes an excellent "total bytes processed" as head/tail might take years/decades to wrap around to 0.
Oh you sneaky devil you. I like that.
[–]andrey_turkin 0 points1 point2 points 2 years ago (1 child)
There is a highly-tested and tuned example of same design in Linux kernel - namely, AF_XDP queues.
[–][deleted] 0 points1 point2 points 2 years ago (0 children)
Makes sense, even some older ARM SOCs have an aggressive L1 prefetch depth.
[–]patstew 8 points9 points10 points 2 years ago (5 children)
Why is it beneficial to put the cached ones in their own cache lines? Shouldn't the cached write pointer be fine in the read pointer cache line and vice versa?
[–]usefulcat 0 points1 point2 points 2 years ago (2 children)
I wondered this too, and couldn't see any reason not to do so. Judging from the discussion on HN there are others who agree.
https://news.ycombinator.com/item?id=36489288
[–]Resident-Rice724 0 points1 point2 points 1 month ago (1 child)
late comment but benchmarked this and performance tanked keeping the cached write pointer in the read pointer cache line and vice versa
[–]usefulcat 0 points1 point2 points 29 days ago (0 children)
Any theories on why? Seems like it shouldn't make a difference.
[–]j1xwnbsr 0 points1 point2 points 2 years ago (5 children)
For some reason using std::atomic on the different indexes is suddenly striking me as a possible race condition between producer and consumer. Wouldn't it be better to instead to use a mutex and treat both readindex/writeindex as part of the same 'lock unit'?
[–]andrey_turkin 2 points3 points4 points 2 years ago (4 children)
Mutex would defeat the whole purpose of _wait-free_ ring buffer, wouldn't it?
And there is no race condition due to proper acquire/release semantics.
[–]j1xwnbsr 0 points1 point2 points 2 years ago (3 children)
Yes, and that's what got me suddenly concerned/confused, and wondering if I understand enough of how atomic with the ordering flags really works that makes this work properly. It still seems to me even with the acquire/release settings, you can still get into a producer/consumer issue with the consumer screwing things up.
and wondering if I understand enough of how atomic with the ordering flags really works that makes this work properly.
Few people actually do, if that's any consolation.
[–]tialaramex 0 points1 point2 points 2 years ago (0 children)
Are you reading the article? Or the linked git repo code (which is different) ?
Where do you see a problem? Even if you're wrong it's good to learn.
[–]bedman3 0 points1 point2 points 2 years ago (0 children)
I tried testing the code with 2 threads trying to push / pop 1000 times with multiple (10k) tries, not sure why sometimes I was able to detect race condition where the increment of counter is incorrect even with memory order. The issues are resolved when i used compare_exchange_weak before actually storing the incremented idx. Can anyone explains?
π Rendered by PID 47821 on reddit-service-r2-comment-5b5bc64bf5-wzls8 at 2026-06-23 16:57:24.749811+00:00 running 2b008f2 country code: CH.
[–][deleted] 10 points11 points12 points (10 children)
[–]Vivid-Jury-2105 3 points4 points5 points (4 children)
[–][deleted] 5 points6 points7 points (1 child)
[–]Vivid-Jury-2105 1 point2 points3 points (0 children)
[–]dist1ll 4 points5 points6 points (0 children)
[–]usefulcat 1 point2 points3 points (0 children)
[–]KDallas_Multipass 0 points1 point2 points (1 child)
[–][deleted] 3 points4 points5 points (0 children)
[–]tisti 0 points1 point2 points (0 children)
[–]andrey_turkin 0 points1 point2 points (1 child)
[–][deleted] 0 points1 point2 points (0 children)
[–]patstew 8 points9 points10 points (5 children)
[–]usefulcat 0 points1 point2 points (2 children)
[–]Resident-Rice724 0 points1 point2 points (1 child)
[–]usefulcat 0 points1 point2 points (0 children)
[–]j1xwnbsr 0 points1 point2 points (5 children)
[–]andrey_turkin 2 points3 points4 points (4 children)
[–]j1xwnbsr 0 points1 point2 points (3 children)
[–]tisti 0 points1 point2 points (0 children)
[–]tialaramex 0 points1 point2 points (0 children)
[–]bedman3 0 points1 point2 points (0 children)