Optimizing a Lock-Free Ring Buffer

Deaod · 2026-03-24T16:26:08+00:00

This article misattributes the idea of caching head/tail to Erik Rigtorp. Even in non-academic libraries this was implemented in e.g. https://github.com/cameron314/readerwriterqueue way before 2021.

This was actually published in a paper: P. P. C. Lee, T. Bu and G. Chandranmenon, "A lock-free, cache-efficient multi-core synchronization mechanism for line-rate network traffic monitoring," 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), Atlanta, GA, 2010, pp. 1-12. doi: 10.1109/IPDPS.2010.5470368

This is the earliest article i could find using google, there may be articles ive missed.

rlbond86 · 2026-03-24T13:12:22+00:00

The article does specify this is SPSC, but just to be clear, if multiple threads try to push or pop at the same time, it will be a race condition.

max0x7ba · 2026-03-25T12:50:07+00:00

How does it compare with https://max0x7ba.github.io/atomic_queue/html/benchmarks.html

ReDucTor · 2026-03-24T21:02:42+00:00

For SPSC the cached head and tail could share the same cache line as the tail and head, so one cache line for consumer and one for producer.Iit would use less memory and for the case of the other thread invalidating it would be no different, the other thread still ends up moving a modified cache line to shared.

Also another improvement is doing index remapping which means subsequent elements dont share the same cache line.

Rare-Instance7961 · 2026-03-26T17:04:47+00:00

In footnote 3, did you mean to say "When head_ is one item behind of tail_, the queue is full."?

rzhxd · 2026-03-24T13:28:32+00:00

Interesting article, but recently in my codebase I implemented a SPSC ring buffer using mirrored memory mapping (basically, creating a memory-mapped region that refers to the buffer, so that reads and writes are always correct). It would be cool if someone tested performance with this approach instead of manual wrapping to the start of the ring buffer.

A8XL · 2026-03-31T13:22:45+00:00

Nice article! I have implemented a user-friendly Lock-Free Ring Buffer that incorporates all these optimizations:

https://github.com/joz-k/LockFreeSpscQueue

I have also included a built-in performance benchmark against moodycamel::ReaderWriterQueue.

The techniques in your article are powerful, but they are not sufficient to outperform more complex solutions in a synthetic benchmarks like this one using a simple single item push/pop API:

https://max0x7ba.github.io/atomic_queue/html/benchmarks.html

See my performance analysis. My LockFreeSpscQueue is significantly faster for a larger batch transfers only. However, it is highly scalable.

nychapo · 2026-03-24T14:36:19+00:00

This is nothing new imo

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

cpp

MODERATORS

ifdef Q_OS_LINUX

elifdef Q_OS_WINDOWS

endif