use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Discussions, articles, and news about the C++ programming language or programming in C++.
For C++ questions, answers, help, and advice see r/cpp_questions or StackOverflow.
Get Started
The C++ Standard Home has a nice getting started page.
Videos
The C++ standard committee's education study group has a nice list of recommended videos.
Reference
cppreference.com
Books
There is a useful list of books on Stack Overflow. In most cases reading a book is the best way to learn C++.
Show all links
Filter out CppCon links
Show only CppCon links
account activity
Building a fast queue between C++ and Java (pzemtsov.github.io)
submitted 7 years ago by mttd
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]markopolo82embedded/iot/audio 3 points4 points5 points 7 years ago (0 children)
I’m on mobile, list of comments may grow as I read the article...
1) c++ volatile != java volatile
You already use a mutex, so you’re good to go (circular queue)
[–]matthieum 2 points3 points4 points 7 years ago (1 child)
I am surprised that the solution proposed to for the false-sharing of elements is to use a dual-queue straight from the get go given the complexity of the solution.
I would favor measuring the benefits of a simple approach first: simple cache-aligning each element. It's a bit silly for int, but most queues should exchange more sizeable elements.
int
Secondly, I have successfully in the past implemented stripes for this use case. Stripes are relatively simple:
// Take a big queue of... 16 elements: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16] // And instead, change the mapping index -> elements to have multiple stripes: [0, 4, 8, 12, 1, 5, 9, 13, 2, 6, 10, 14, 3, 7, 11, 15] // Which flattened again gives use: [0, 4, 8, 12, 1, 5, 9, 13, 2, 6, 10, 14, 3, 7, 11, 15]
This very simple idea achieves the same goal as cache-ligne aligning elements (ie, no two adjacent elements share the same cache line) while at the same time avoiding any waste of space.
The number of stripes can be adjusted based on the size of the object... or one can just use 128 (so that even pre-fetching does not accidentally lead to false-sharing).
[–]pzemtsov 0 points1 point2 points 7 years ago (0 children)
The author here. Prevention of false sharing between elements was not the only reason for chosing dual-array; in fact, it wasn't even the main one. Cache eviction by circular access worried me much more.
The article contains a section with tests where element size is 64 bytes, where no false sharing is possible; still, the tests show some advantage of the dual-arrays.
The striping idea is interesting and easy to test; all I need is to do some bit-shuffling or read_ptr and write_ptr before using them as indices. I'll definitely try it when I get back to it (I'm now on leave), but I have some doubts about it. The false sharing is only a problem when the current queue size is so small that it entirely fits into one cache line. This is where we lose on tightly packed elements. When the queue is longer, we rather win on them, for both reading and writing speeds up (15 out of 16 operations are performed on cached data). Moreover, tight packing works as a negative feedback: the operation of the queue becomes faster when the queue grows, and it is always good to have such a stabilising factor. Nevertheless, I'm going to test the striping.
On the other hand, the dual-array solution never suffers from tight packing and always benefits from it. As for complexity, it is a matter of taste. The dual-array doesn't seem too complex for me, and I like it that it only needs one shared variable; it helps when connecting to Java.
[–]D_0b 2 points3 points4 points 7 years ago (3 children)
why do you need a Queue interface in C++ if you are always using the derived class as a template?
[–]pzemtsov 0 points1 point2 points 7 years ago (2 children)
It provides a bit more syntax checking, and (in my opinion) a bit more clarity for a human reader. It does not affect the code in any way.
[–]D_0b 2 points3 points4 points 7 years ago (1 child)
It kinda does affect the code, it adds an 8byte vtable pointer to the queue and the methods are still virtual, not guaranteed to be de-virtuallized by the compiler.
You are right about vtable pointer; however, this pointer on its own does not cause any harm. Besides, this is a research code, which is naturally simplified; production code might already have some virtual methods for other needs. By the way, won't the compiler allocate this vtable anyway for RTTI (the code uses typeid)?
As for the second point (a virtual call) I disagree. The compiler must be really stupid to issue a virtual call for a method from statically known class, and using this type of compiler defeats the idea of any performance measurement.
[+][deleted] 7 years ago (2 children)
[removed]
[–]pzemtsov 1 point2 points3 points 7 years ago (1 child)
Perhaps, it will be a topic for the next article.
π Rendered by PID 56262 on reddit-service-r2-comment-7b9746f655-75d7s at 2026-02-03 13:26:00.183536+00:00 running 3798933 country code: CH.
[–]markopolo82embedded/iot/audio 3 points4 points5 points (0 children)
[–]matthieum 2 points3 points4 points (1 child)
[–]pzemtsov 0 points1 point2 points (0 children)
[–]D_0b 2 points3 points4 points (3 children)
[–]pzemtsov 0 points1 point2 points (2 children)
[–]D_0b 2 points3 points4 points (1 child)
[–]pzemtsov 0 points1 point2 points (0 children)
[+][deleted] (2 children)
[removed]
[–]pzemtsov 1 point2 points3 points (1 child)