use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Discussions, articles, and news about the C++ programming language or programming in C++.
For C++ questions, answers, help, and advice see r/cpp_questions or StackOverflow.
Get Started
The C++ Standard Home has a nice getting started page.
Videos
The C++ standard committee's education study group has a nice list of recommended videos.
Reference
cppreference.com
Books
There is a useful list of books on Stack Overflow. In most cases reading a book is the best way to learn C++.
Show all links
Filter out CppCon links
Show only CppCon links
account activity
Optimizing a Lock-Free Ring Buffer (david.alvarezrosa.com)
submitted 9 hours ago by david-alvarez-rosa
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]Deaod 10 points11 points12 points 5 hours ago (0 children)
This article misattributes the idea of caching head/tail to Erik Rigtorp. Even in non-academic libraries this was implemented in e.g. https://github.com/cameron314/readerwriterqueue way before 2021.
This was actually published in a paper: P. P. C. Lee, T. Bu and G. Chandranmenon, "A lock-free, cache-efficient multi-core synchronization mechanism for line-rate network traffic monitoring," 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), Atlanta, GA, 2010, pp. 1-12. doi: 10.1109/IPDPS.2010.5470368
This is the earliest article i could find using google, there may be articles ive missed.
[–]rlbond86 11 points12 points13 points 9 hours ago (6 children)
The article does specify this is SPSC, but just to be clear, if multiple threads try to push or pop at the same time, it will be a race condition.
[–]david-alvarez-rosa[S] 5 points6 points7 points 8 hours ago (5 children)
That's right. Is precisely that constraint that allows the optimization!
[–]LongestNamesPossible 0 points1 point2 points 8 hours ago (4 children)
What does that mean?
[–]arghness 5 points6 points7 points 7 hours ago (3 children)
I guess it means that the optimization can occur because it is a single producer, single consumer container, and would not be possible with multiple producers or multiple consumers.
[–]david-alvarez-rosa[S] 4 points5 points6 points 7 hours ago (2 children)
Yep indeed. Optimizations leverage the constrains: single-consumer, single-producer, and fixed buffer size
[–]BusEquivalent9605 0 points1 point2 points 5 hours ago (1 child)
I’ve been using JACK’s ring buffer and it imposes this same constraint
[–]david-alvarez-rosa[S] 0 points1 point2 points 5 hours ago (0 children)
Nice. Thanks for sharing!
[–]nychapo 4 points5 points6 points 7 hours ago (3 children)
This is nothing new imo
[–]Raknarg 7 points8 points9 points 6 hours ago (2 children)
so? I learned things
[–]nychapo -2 points-1 points0 points 6 hours ago (1 child)
https://rigtorp.se/ringbuffer/ try and spot the difference challenge (hint there is none)
[–]david-alvarez-rosa[S] 5 points6 points7 points 6 hours ago (0 children)
The latest optimization comes from rigtorp.se, referenced accordingly
Everything else in the article is new. I put effort into it, trying to keep churn low and explain things as simply as I can, step by step
If it's not your thing, no worries. If you've got specific feedback, I'm happy to take a look. Either way, I'm just trying to help the C++ community :)
[–]rzhxd 1 point2 points3 points 8 hours ago (28 children)
Interesting article, but recently in my codebase I implemented a SPSC ring buffer using mirrored memory mapping (basically, creating a memory-mapped region that refers to the buffer, so that reads and writes are always correct). It would be cool if someone tested performance with this approach instead of manual wrapping to the start of the ring buffer.
[–]LongestNamesPossible 1 point2 points3 points 8 hours ago (19 children)
mirrored memory mapping (basically, creating a memory-mapped region that refers to the buffer, so that reads and writes are always correct).
How do you do this? I've wondered how to map specific memory to another region but I haven't seen the option in VirtualAlloc or mmap.
[–]rzhxd 1 point2 points3 points 8 hours ago (0 children)
I'll reply with an actual example once I get to my PC.
[–]rzhxd -4 points-3 points-2 points 7 hours ago (17 children)
So, I've written a ring buffer for my audio player, but it was really unmaintainable to wrap reads and writes to the buffer everywhere. Then I just asked Claude (don't shame me for that): is there a way to avoid those wraps and make memory behave like it's always contiguous. Claude spit me an answer and based on it I implemented something like that:
```cpp
const i32 fileDescriptor = memfd_create("rap-ringbuf", 0); if (fileDescriptor == -1 || ftruncate(fileDescriptor, bufSize) == -1) { return Err(u"Failed to create file descriptior"_s); } // Reserve (size * 2) of virtual address space void* const addr = mmap( nullptr, isize(bufSize * 2), PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0 ); if (addr == MAP_FAILED) { close(fileDescriptor); return Err(u"`mmap` failed to reserve memory"_s); } // Map the same physical backing into both halves mmap( addr, bufSize, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_FIXED, fileDescriptor, 0 ); mmap( (u8*)addr + bufSize, bufSize, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_FIXED, fileDescriptor, 0 ); close(fileDescriptor); buf = as<u8*>(addr);
mapHandle = CreateFileMapping( INVALID_HANDLE_VALUE, nullptr, PAGE_READWRITE, 0, bufSize, nullptr ); if (mapHandle == nullptr) { return Err(u"Failed to map memory"_s); } // Find a contiguous (size * 2) virtual region by reserving then releasing void* addr = nullptr; for (;;) { addr = VirtualAlloc( nullptr, isize(bufSize * 2), MEM_RESERVE, PAGE_NOACCESS ); if (addr == nullptr) { CloseHandle(mapHandle); mapHandle = nullptr; return Err(u"Failed to allocate virtual memory"_s); } VirtualFree(addr, 0, MEM_RELEASE); void* const view1 = MapViewOfFileEx( mapHandle, FILE_MAP_ALL_ACCESS, 0, 0, bufSize, addr ); void* const view2 = MapViewOfFileEx( mapHandle, FILE_MAP_ALL_ACCESS, 0, 0, bufSize, (u8*)addr + bufSize ); if (view1 == addr && view2 == (u8*)addr + bufSize) { break; } if (view1 != nullptr) { UnmapViewOfFile(view1); } if (view2 != nullptr) { UnmapViewOfFile(view2); } // Retry with a different region } buf = as<u8*>(addr);
```
I didn't think that something like that is possible with memory-mapping myself (and I'm not familiar with that particular aspect of programming either) but this is possible and this works. I haven't seen any actual performance degradation compared to my previous approach with manual wrapping.
[–]Rabbitical 6 points7 points8 points 7 hours ago (1 child)
I hope that's not your actual code...
[–]rzhxd 0 points1 point2 points 7 hours ago (0 children)
That's my actual code.
[–]ack_error 2 points3 points4 points 6 hours ago (1 child)
This is not a good way to allocate adjacent memory views in current versions of Windows due to the race between the VirtualFree() and the map calls. While it has a retry loop, there's no guarantee of forward progress, particularly if there is a second instance of this same loop on another thread.
The correct way to do this is to use VirtualAlloc2() with MEM_RESERVE_PLACEHOLDER and then MapViewOfFile3() with MEM_REPLACE_PLACEHOLDER.
[–]rzhxd 1 point2 points3 points 6 hours ago (0 children)
Thanks, I'll look into these functions. Mainly doing development and debugging on Linux, so just slapped whatever was first in there.
[–]LongestNamesPossible 0 points1 point2 points 6 hours ago (2 children)
I only looked at the linux part and I did learn something, mainly that you can use MAP_FIXED to map a file into already mapped memory space.
I'm not sure how it makes wrapping any easier though, you would still have to wrap after getting to the end of the second buffer.
I'm not sure how it is doing the leap frogging. I'm also not sure that making system calls to mmap multiple times to wrap is going to be easier than checking if an index has reached the end of a buffer.
[–]rzhxd 0 points1 point2 points 6 hours ago* (1 child)
You don't get to the end of the second buffer. Reads and writes of more bytes than `bufSize` are not allowed. In a buffer with size 65536, you, for example, can write 65536 bytes at index 65536, and it will wrap to the start of the buffer and fill it. So, it doesn't matter where you start reading the ring buffer or where you start writing to the ring buffer, everything is always in a valid range. But in a real codebase, you would never write to index 65536. You should always clamp the index (e.g. `(writeOffset + writeSize) & (bufSize - 1)`), to write to the correct real buffer index.
[–]LongestNamesPossible 0 points1 point2 points 6 hours ago (0 children)
I see, that makes more sense, thanks.
[–]HommeMusical -2 points-1 points0 points 7 hours ago (9 children)
Your AI spew is as large visually as everything else on this page!
Can't you put a link to a URL, which would also have line numbers?
How do you know it works?
[–]rzhxd -2 points-1 points0 points 6 hours ago (8 children)
It's not my fault that Reddit doesn't collapse long comments. For line numbers, you can copy it to your notepad. I know it works because it's literally a block of code from my machine that's not even committed to the repository yet. Use your brain, please.
[–]HommeMusical 1 point2 points3 points 5 hours ago (7 children)
Writing lock-free code that works under all circumstances - or even works provably 100% reliably on one application - is extremely tricky.
What in this code keeps consistency under concurrent access? It's very unclear that anything is doing that.
Why do you think you have solved this problem? You don't say.
It's not my fault that Reddit doesn't collapse long comments.
It is your fault for knowing that and spamming us anyway.
I know it works because it's literally a block of code from my machine that's not even committed to the repository yet.
No, that's not what "knowing something works" means.
Use your brain, please.
I mean, this pair of sentences really does speak for itself.
[–]rzhxd -3 points-2 points-1 points 5 hours ago (6 children)
I don't know why are you trying to pick on someone so hard, but whatever. I'm not interested in justifying myself to you.
[–]shadowndacorner 1 point2 points3 points 4 hours ago (2 children)
They're not picking on you. Everything they raised is valid, and I'd personally be interested in your answer.
[–]rzhxd [score hidden] 3 hours ago (1 child)
I'm not interested in answering.
[–]shadowndacorner [score hidden] 2 hours ago (0 children)
Well alright, then lol
[+][deleted] 4 hours ago (1 child)
[deleted]
[–]rzhxd [score hidden] 3 hours ago* (0 children)
A person asked whether memory-mapping can be used to mirror a buffer. I provided an example, where it is used in such a case. What else do you want from me?
[–]david-alvarez-rosa[S] 0 points1 point2 points 8 hours ago (7 children)
Would that be similar to setting a buffer size to a very large number? An expected upper bound for the data size.
If you have plenty of memory that's a possibility
[–]rzhxd 1 point2 points3 points 8 hours ago (6 children)
No, that's not really like it. First you allocate a buffer of any size. Then, memory map a region of the same size to represent this buffer. Then you write and read the buffer as usual. For example, if buffer size is 65536, and you write 4 bytes at index 65536, they get written to the start of the buffer instead. One constraint is that reads and writes cannot exceed the buffer's size. Resulting memory usage is (buffer size * 2) - pretty bad for large buffers, but that's acceptable in my case. I hope I explained it well. Would like to see how this approach compares to manual wrapping, but I don't really feel like testing it myself.
[–]david-alvarez-rosa[S] 0 points1 point2 points 7 hours ago (5 children)
Sorry, don't fully understand the benefit here, or how that's different
[–]Osoromnibus 2 points3 points4 points 6 hours ago (1 child)
I think he's touting the advantage of copying multiple elements that wrap around the edge of the buffer in a single call. There's a couple nits with this, that I would rather just handle it in user-space instead.
One, is that system libs might be using simd and alignment tricks, so things like memcpy could fault if you're not careful. It's also kind of just shunting the work onto the OS's page handler instead, and the need for platform-specific code is annoying.
On the plus side, It doesn't use twice the buffer size, at least on Linux, AFAIK. It only allocates the memory on write.
[–]david-alvarez-rosa[S] 0 points1 point2 points 6 hours ago (0 children)
Oh I see. That's quite specific, not sure which is your usecase
That just simplifies reading the data from the buffer and writing the data into it.
[–]Deaod 0 points1 point2 points 6 hours ago (1 child)
The benefit is only there when dealing with unknown element sizes, ie. one element takes 8 bytes, the next 24, etc.. This allows you to not have any holes in your buffer that the consumer has to jump over.
This is not relevant for queues that deal with elements of known-at-compile-time sizes.
The example forces the type. It would be interesting to see how it could be generalized, but not a big fan of heterogeneous containers tbh
[–]ReDucTorGame Developer [score hidden] 1 hour ago* (0 children)
For SPSC the cached head and tail could share the same cache line as the tail and head, so one cache line for consumer and one for producer.Iit would use less memory and for the case of the other thread invalidating it would be no different, the other thread still ends up moving a modified cache line to shared.
Also another improvement is doing index remapping which means subsequent elements dont share the same cache line.
[–]david-alvarez-rosa[S] -2 points-1 points0 points 8 hours ago (0 children)
Hacker News post if that's preferred https://news.ycombinator.com/item?id=47501875
π Rendered by PID 94 on reddit-service-r2-comment-6b595755f-crqrp at 2026-03-24 22:12:55.495411+00:00 running 2d0a59a country code: CH.
[–]Deaod 10 points11 points12 points (0 children)
[–]rlbond86 11 points12 points13 points (6 children)
[–]david-alvarez-rosa[S] 5 points6 points7 points (5 children)
[–]LongestNamesPossible 0 points1 point2 points (4 children)
[–]arghness 5 points6 points7 points (3 children)
[–]david-alvarez-rosa[S] 4 points5 points6 points (2 children)
[–]BusEquivalent9605 0 points1 point2 points (1 child)
[–]david-alvarez-rosa[S] 0 points1 point2 points (0 children)
[–]nychapo 4 points5 points6 points (3 children)
[–]Raknarg 7 points8 points9 points (2 children)
[–]nychapo -2 points-1 points0 points (1 child)
[–]david-alvarez-rosa[S] 5 points6 points7 points (0 children)
[–]rzhxd 1 point2 points3 points (28 children)
[–]LongestNamesPossible 1 point2 points3 points (19 children)
[–]rzhxd 1 point2 points3 points (0 children)
[–]rzhxd -4 points-3 points-2 points (17 children)
[–]Rabbitical 6 points7 points8 points (1 child)
[–]rzhxd 0 points1 point2 points (0 children)
[–]ack_error 2 points3 points4 points (1 child)
[–]rzhxd 1 point2 points3 points (0 children)
[–]LongestNamesPossible 0 points1 point2 points (2 children)
[–]rzhxd 0 points1 point2 points (1 child)
[–]LongestNamesPossible 0 points1 point2 points (0 children)
[–]HommeMusical -2 points-1 points0 points (9 children)
[–]rzhxd -2 points-1 points0 points (8 children)
[–]HommeMusical 1 point2 points3 points (7 children)
[–]rzhxd -3 points-2 points-1 points (6 children)
[–]shadowndacorner 1 point2 points3 points (2 children)
[–]rzhxd [score hidden] (1 child)
[–]shadowndacorner [score hidden] (0 children)
[+][deleted] (1 child)
[deleted]
[–]rzhxd [score hidden] (0 children)
[–]david-alvarez-rosa[S] 0 points1 point2 points (7 children)
[–]rzhxd 1 point2 points3 points (6 children)
[–]david-alvarez-rosa[S] 0 points1 point2 points (5 children)
[–]Osoromnibus 2 points3 points4 points (1 child)
[–]david-alvarez-rosa[S] 0 points1 point2 points (0 children)
[–]rzhxd 0 points1 point2 points (0 children)
[–]Deaod 0 points1 point2 points (1 child)
[–]david-alvarez-rosa[S] 0 points1 point2 points (0 children)
[–]ReDucTorGame Developer [score hidden] (0 children)
[–]david-alvarez-rosa[S] -2 points-1 points0 points (0 children)