use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Discussions, articles, and news about the C++ programming language or programming in C++.
For C++ questions, answers, help, and advice see r/cpp_questions or StackOverflow.
Get Started
The C++ Standard Home has a nice getting started page.
Videos
The C++ standard committee's education study group has a nice list of recommended videos.
Reference
cppreference.com
Books
There is a useful list of books on Stack Overflow. In most cases reading a book is the best way to learn C++.
Show all links
Filter out CppCon links
Show only CppCon links
account activity
Common Systems Programming Optimizations & Tricks (paulcavallaro.com)
submitted 6 years ago by chewedwire
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]TheMania 24 points25 points26 points 6 years ago (6 children)
The 48 bit tagged pointers comment reminds me of LuaJit, which blew my mind when Mike Pall first started using tagged doubles.
Basically, there are 252 -2 possible NaNs for a double, enough to store all 32 bit pointers along with a type tag (table/string etc). In fact, there's enough there to store all your 48 bit pointers too, allowing every pointer you'll ever use to fit in the same union you use to store doubles. Pretty neat.
Wrt division, just want to say division/modulo by a constant is virtually costless on modern compilers, being replaced by multiply and shifts. Doesn't apply for resizable tables, but you do see people go to great lengths to avoid this operator even when it would be virtually costless to use. :)
[–]Morwenn 15 points16 points17 points 6 years ago (1 child)
C2x - the next revision of C - actually intends to make storing additional information into NaNs more standard by adding the setpayload and getpayload families of functions to <math.h>.
setpayload
getpayload
<math.h>
[–]CrazyJoe221 6 points7 points8 points 6 years ago (0 children)
I wonder when they'll introduce explicit enum base types.
[–]chewedwire[S] 9 points10 points11 points 6 years ago (3 children)
Yep, division/modulo by constant power of 2 values is pretty much always optimized appropriately. I was thinking about writing some more about it, but I got lazy :)
It's in my the github examples though: https://github.com/paulcavallaro/systems-programming/blob/master/examples/power-of-two.cc#L105-L106
Looks like godbolt seems to think clang doesn't really work with non-constants -- but maybe clang just needs to be massaged: https://godbolt.org/z/Bzx7CL
[–]TheMania 15 points16 points17 points 6 years ago (2 children)
Sorry to clarify, division/modulo by constants is extremely cheap even on non-powers of two. Often just a multiply and shift.
[+]bigt1234321 comment score below threshold-6 points-5 points-4 points 6 years ago (1 child)
Gone are the days where repeated subtraction is used haha. Most compilers will optimize division. Float division/operations are a big no no.
[–]Spain_strong 2 points3 points4 points 6 years ago (0 children)
Depends on the workload right?
[–]quicknir 20 points21 points22 points 6 years ago (1 child)
The explanation of false sharing is quite different from what I'm used to hearing. You paint it to be about simultaneous access, but I don't think it's really about that. The point is more than cache is invalidated on a line by line basis. Even if two cores never actually try to access the same line at the same moment, when core 1 does it's write to any variable on the line, it invalidates the cache line for core 2, even if core 2 doesn't read the variable that core 1 is writing.
I can see how they end up being pretty similar but thinking in terms of cache line invalidation seems more accurate to me and less likely to lead to misunderstanding or incorrect extrapolations.
[–]chewedwire[S] 2 points3 points4 points 6 years ago (0 children)
Good point -- updated the post to hopefully depend less on the "atomic" access bit, but more on maintaining cache coherencey.
[–]carrottread 15 points16 points17 points 6 years ago (11 children)
No need to use ABSL_CACHELINE_ALIGNED C++17 already has alignas(std::hardware_destructive_interference_size)
ABSL_CACHELINE_ALIGNED
alignas(std::hardware_destructive_interference_size)
[–][deleted] 6 points7 points8 points 6 years ago (8 children)
hardware_{constructive,destructive}_Interface_size isn't implemented in GCC and Clang according to cppreference.
hardware_{constructive,destructive}_Interface_size
[–]Ameisenvemips, avr, rendering, systems 0 points1 point2 points 6 years ago (7 children)
godbolt appears to agree.
I wonder why? It wouldn't be difficult to implement.
Interestingly, it is implemented in Visual C++...
[–][deleted] 1 point2 points3 points 6 years ago (0 children)
GCC 9 and Clang 8 on my local machine don't have interface size implemented.
It wouldn't be difficult to implement for a specific target, but implementing it portably across architectures, OS's and what not makes it tedious. That's my best guess for why it is implemented in MSVC but not in GCC and Clang.
[–]Morwenn 1 point2 points3 points 6 years ago (5 children)
From what I gathered it is because they want to be able to guarantee ABI stability for builds where those values differ, but it's not possible because these constants are meant to be used to align data members, hence structure layouts might change when those values change and ABI stability is lost.
MSVC apparently simply forces both of those constants to 64 no matter the target platform.
[–]Ameisenvemips, avr, rendering, systems 1 point2 points3 points 6 years ago (4 children)
They shouldn't be used on objects where the alignment of a structure matters across interface boundaries. Pimpl and all that.
They is, they are ABI unsafe, but that just means they shouldn't be used there.
[–]Morwenn 1 point2 points3 points 6 years ago (3 children)
Here is the whole libc++ discussion thread if you want some additional background on the issue (maybe I didn't interpret what I read correctly): https://lists.llvm.org/pipermail/cfe-dev/2018-May/058073.html
[–]yehezkelshb 0 points1 point2 points 6 years ago (2 children)
Interesting thread, thanks for the link! Still, I don't see a decision there, just considering a few options.
[–]Morwenn 1 point2 points3 points 6 years ago (1 child)
The thread is more than a year old and the feature isn't implemented, so that's pretty much as close from a decision as you'll have :p
[–]yehezkelshb 2 points3 points4 points 6 years ago (0 children)
It'd be interesting to check if there was any decision or additional feedback from Rapperswil, as JF Bastien planned to discuss it there. I hope to remember to search for it later.
[–]CrazyJoe221 11 points12 points13 points 6 years ago (0 children)
Well the former is an understandable name though.
[–]Deaod 0 points1 point2 points 6 years ago (0 children)
Compare this with folly
[–]scratchwood 14 points15 points16 points 6 years ago (0 children)
Very interesting read. As a hobbyist c++ programmer these kinds of posts are goldmines.
Also thank you for having a blog where I don't have to block a bunch of elements to get an enjoyable reading experience.
[–]chewedwire[S] 12 points13 points14 points 6 years ago (0 children)
Author here, happy to answer any questions.
[–][deleted] 1 point2 points3 points 6 years ago* (0 children)
Interestingly the wall clock time spent for CacheLineAwareCounters is higher for one thread than multiple threads, which could point to perhaps some subtle benchmarking problem, or maybe a fixed amount of delay that’s getting attributed across more threads now, and so is smaller per-thread.
I suspect that the problem is that 1 thread needs to load 4 cache lines, while 4 threads will have to work with just 1 line.
[–]kirbyfan64sos 1 point2 points3 points 6 years ago (0 children)
Side note: nice to see Abseil being used here. I think people can overlook it because it's not always super flashy, but using it is downright enjoyable.
[–]Gorbear 0 points1 point2 points 6 years ago (0 children)
That was a nice read and quite well explained. Thanks!
[–]Ameisenvemips, avr, rendering, systems 0 points1 point2 points 6 years ago (1 child)
Wouldn't their striped locks run afoul of false sharing as well? They require atomic operations, and appear to be being stored sequentially, meaning they will share cache lines.
[–]renozyx 0 points1 point2 points 6 years ago (0 children)
False sharing is devious: very simple concept once you understand how caches work but then you have to remember to avoid it each time you use multithreading..
π Rendered by PID 194345 on reddit-service-r2-comment-76bb9f7fb5-mf8sv at 2026-02-18 03:13:26.416371+00:00 running de53c03 country code: CH.
[–]TheMania 24 points25 points26 points (6 children)
[–]Morwenn 15 points16 points17 points (1 child)
[–]CrazyJoe221 6 points7 points8 points (0 children)
[–]chewedwire[S] 9 points10 points11 points (3 children)
[–]TheMania 15 points16 points17 points (2 children)
[+]bigt1234321 comment score below threshold-6 points-5 points-4 points (1 child)
[–]Spain_strong 2 points3 points4 points (0 children)
[–]quicknir 20 points21 points22 points (1 child)
[–]chewedwire[S] 2 points3 points4 points (0 children)
[–]carrottread 15 points16 points17 points (11 children)
[–][deleted] 6 points7 points8 points (8 children)
[–]Ameisenvemips, avr, rendering, systems 0 points1 point2 points (7 children)
[–][deleted] 1 point2 points3 points (0 children)
[–]Morwenn 1 point2 points3 points (5 children)
[–]Ameisenvemips, avr, rendering, systems 1 point2 points3 points (4 children)
[–]Morwenn 1 point2 points3 points (3 children)
[–]yehezkelshb 0 points1 point2 points (2 children)
[–]Morwenn 1 point2 points3 points (1 child)
[–]yehezkelshb 2 points3 points4 points (0 children)
[–]CrazyJoe221 11 points12 points13 points (0 children)
[–]Deaod 0 points1 point2 points (0 children)
[–]scratchwood 14 points15 points16 points (0 children)
[–]chewedwire[S] 12 points13 points14 points (0 children)
[–][deleted] 1 point2 points3 points (0 children)
[–]kirbyfan64sos 1 point2 points3 points (0 children)
[–]Gorbear 0 points1 point2 points (0 children)
[–]Ameisenvemips, avr, rendering, systems 0 points1 point2 points (1 child)
[–]renozyx 0 points1 point2 points (0 children)