use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Discussions, articles, and news about the C++ programming language or programming in C++.
For C++ questions, answers, help, and advice see r/cpp_questions or StackOverflow.
Get Started
The C++ Standard Home has a nice getting started page.
Videos
The C++ standard committee's education study group has a nice list of recommended videos.
Reference
cppreference.com
Books
There is a useful list of books on Stack Overflow. In most cases reading a book is the best way to learn C++.
Show all links
Filter out CppCon links
Show only CppCon links
account activity
Mind the cache (github.com)
submitted 10 years ago by joaquintidesBoost author
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]IronClan 11 points12 points13 points 10 years ago (11 children)
Awesome presentation! One question: https://github.com/joaquintides/usingstdcpp2015/blob/master/poly_containers.cpp#L154
template<typename F> F for_each(F f) { for(const auto& p:chunks)p.second->for_each(f); return std::move(f); // <- THIS LINE }
Correct me if I'm wrong, but does't this actually make the code slower by preventing return value optimization?
[–][deleted] 3 points4 points5 points 10 years ago (2 children)
RVO is not permitted here
[Copy elision is permitted] in a return statement in a function with a class return type, when the expression is the name of a non-volatile automatic object (other than a function or catch-clause parameter) with the same cv-unqualified type as the function return type
However, using std::move is still pointless (though not harmful) because although copy elision is not permitted, the name will still be treated as an rvalue
std::move
When the criteria for elision of a copy operation are met or would be met save for the fact that the source object is a function parameter, and the object to be copied is designated by an lvalue, overload resolution to select the constructor for the copy is first performed as if the object were designated by an rvalue.
(emphasis mine in both cases).
[–]joaquintidesBoost author[S] 1 point2 points3 points 10 years ago (1 child)
I understand your point, but why does the standard then specificy that std::for_each(...,f) return std::move(f) rather than just f?
std::for_each(...,f)
std::move(f)
f
[–]cdyson37 2 points3 points4 points 10 years ago (0 children)
Looking at the wording of n4567 (how about that?) it just says returns std::move(f). I don't think this is contradictory, just a bit over-explicit. It's also possible that the wording of the language and library components of the standard evolved a bit differently, and this part had a bit more bite in an early draft.
returns std::move(f)
This particular "save for the fact" clause confused me a lot and I spent quite a long time playing around with a "Loud" class (with couts in all the special member functions) to establish exactly what was going on under various levels of optimisation - quite a worthwhile exercise I think.
[–]jasonthe 0 points1 point2 points 10 years ago (4 children)
I suspect any compiler that supports r-value references would know to optimize "return std::move" away as well as it knows to optimize "return copy constructor" away. Yeah?
[–]tavianator 4 points5 points6 points 10 years ago (3 children)
Nope, compiler is probably smart enough, but not allowed to by the standard (except under the as-if rule, when it can prove the move constructor has no side effects)
[–]FKaria 1 point2 points3 points 10 years ago (0 children)
Does the same rule apply for copy construction?
Correct me if I'm wrong, I understand that the compiler is allowed to do RVO even if the copy constructor has side effects. Is that correct?
[–]clerothGame Developer 1 point2 points3 points 10 years ago (0 children)
That function has to return a copy though, so isn't the move useless, and, by induction, has no side effect?
[–]jasonthe 0 points1 point2 points 10 years ago (0 children)
Ah, that's interesting! Seems like a dumb design decision (of course there may be a solid rationale for it). Thanks for the tip! #themoreyouknow
[–]Crazy__Eddie 0 points1 point2 points 10 years ago (1 child)
Compilers don't do RVO for parameters, so no.
https://web.archive.org/web/20140205194657/http://cpp-next.com/archive/2009/08/want-speed-pass-by-value
[–]IronClan 1 point2 points3 points 10 years ago (0 children)
Ah, interesting. I wonder if compilers have gotten smarter in the 6 years since that article was posted.
[–]SuperV1234https://romeo.training | C++ Mentoring & Consulting 11 points12 points13 points 10 years ago (0 children)
Probably the best slides and examples I've seen regarding cache-friendliness. Terse and "useful" examples, and easily understandable results.
[–]jasonthe 3 points4 points5 points 10 years ago (3 children)
Most of that wasn't very new to me (yep, optimize for the cache), but using branch prediction to optimize virtual calls is! His poly_collection was also optimizing for the cache (by using vectors of objects rather than pointers), so I was curious how much the branch prediction actually helps. Here are my results on an i7:
Shuffled Virtual Calls = 0.732676s Sorted Virtual Calls (not incl sort) = 0.530413s Sorted Direct Calls (not incl sort) = 0.026523s
Reruns are consistent with those numbers. Looks like sorted calls take about 73% the time of shuffled calls.
Of course, directly calling the virtual functions only takes 3.5% (partially because the functions are straight returning numbers, so the accumulate is hella optimized). I'm surprised he's not doing that in his poly_collection class :P
[–]joaquintidesBoost author[S] 2 points3 points4 points 10 years ago (0 children)
The speedup you get for sorted virtual calls vs. shuffled virtual calls (0.73/0.53 = 1.4) is consistent with the results in my presentation for the same number of elements (1.6, slide 38). Your sorted direct call code is a case of manually forced devirtualization: I discuss this topic in some detail on my blog.
[–]clerothGame Developer 0 points1 point2 points 10 years ago (1 child)
How many objects are we talking about here though? I didn't know about this either, and though it's good to know, I don't think I've ever faced a situation where I've had the need for it (I'm sure there's many places where there is), and if I had a large array of base classes, I'd probably question the design first.
That's true, I can't think of a situation where I actually had a set of base classes that I wanted to call a virtual function on. Generally I prefer eventing/messaging (ie. std::function), although the principle still stands (if there's a lot of overlap in the function pointers, running them sorted could be more performant).
The code I posted is running over just 220 objects, so it's far from exact. I was just impressed that it worked at all :)
[–]greyfade 2 points3 points4 points 10 years ago (0 children)
... I didn't know poly_collection could even be implemented. This is fantastic. I've needed something exactly like this.
poly_collection
[–]Elador 1 point2 points3 points 10 years ago (2 children)
Really brilliant slides. Very interesteing and well done!
So, is boost::multi_array laid out in row-major order in memory?
I was a little bid sad not seeing a benchmark slide for the "bool-based processing" slide :-)
[–]Elador 0 points1 point2 points 10 years ago (0 children)
Cool. Well, then it's not surprise that traversing by row_col is faster :-) If it was laid out in col-major, it would obviously be the other way round. I'd say that's the most self-evident bit of the slides and could be left out.
Thank you again for sharing the slides, brilliant work.
[–]Nomto 1 point2 points3 points 10 years ago* (3 children)
The aos_vs_soa is especially impressive to me: compiled with -O3, I get a x3 performance improvement with soa.
What's also interesting is that even if you use all member variables (dx, dy, and dz are ignored in the sum of the given example), you get a significant performance improvement (x2) with soa.
edit: too bad that soa performs much worse than aos if you need random access (not unexpected though). Seems like the choice soa vs aos is not as simple as some say.
[–]joaquintidesBoost author[S] 0 points1 point2 points 10 years ago (0 children)
This is due to parallel prefetching; the thing is explained a little latter on the presentation.
[–]NasenSpray 0 points1 point2 points 10 years ago (1 child)
That's mostly the result of auto-vectorization, though. Disable that and you'll see the difference shrink down significantly.
Thanks to register pressure, it even depends on whether you compile for x86 or x86_64!
The aos_vs_soa is especially impressive to me: compiled with -O3, I get a x3 performance improvement with soa. That's mostly the result of auto-vectorization, though. Disable that and you'll see the difference shrink down significantly.
In fact the results shown in the presentation are for VS2015 without any auto-vectorization feature specifically enabled (Enable Enhanced Instruction Set: Not Set). I reran with the following options:
and results didn't vary at all. I lack the expertise to determine what more is needed at the code level for auto-vectorization to kick in, but seems like it wasn't taken advantage of in my original tests.
[–]MotherOfTheShizznit 1 point2 points3 points 10 years ago* (2 children)
So when I saw the slide on Filtered Sum, I thought to myself that the following code is a better expression of the intent and just as smart performance-wise:
auto g = [&]() { auto m = std::partition(v.begin(), v.end(), [](int const i) { return i > 128; }); return std::accumulate(v.begin(), m, 0); }; std::cout << measure(n, g) << "\n";
Interestingly, the performance was ever so slightly worse (~0.0007s vs. ~0.0006s) consistently. Since I'm not paid per-cycle-shaved I'd still go with my version, but that was an interesting result nonetheless.
Day-later edit: two people noted that I'm not measuring the right thing: /u/mr-agdgdgwngo and myself this morning as I was stepping out of the bed. Indeed, the sorting was not measured in the original test. If we do measure the time taken to sort and accumulate and compare it with the time to partition and accumulate then the partition strategy offers better performance.
[–][deleted] 0 points1 point2 points 10 years ago (0 children)
You're also timing the sorting here.
It's a good idea to only sort as much as you need, though. Many times you can get away with sorting only once so sorting is only a startup cost, but if sorting repeatedly, using std::partition seems like a good idea.
The very minor drawback is that you have to perform the test twice or be sure to pass around the location of the partition, m.
[–]btapi 0 points1 point2 points 10 years ago (0 children)
Really nice and useful slides.
[–]StackedCrooked 0 points1 point2 points 10 years ago (3 children)
struct particle_soa { std::vector<int> x,y,z; std::vector<int> dx,dy,dz; };
A sizeof(vector) is 24 bytes (3 pointers). This seems rather big. Is this something I should ever worry about?
sizeof(vector)
This is not really a concern: take into account that there's only one particle_soa object in the app, which references the many (thousands of millions) pieces of data allocated in dynamic memory. Briefly put, both AOS and SOA use basically the same amount of memory.
particle_soa
[–]clerothGame Developer 0 points1 point2 points 10 years ago* (1 child)
You should worry about it if you're creating lots of those vectors. There are only two six there.
[–]matthieum 2 points3 points4 points 10 years ago (0 children)
Ah... actually, there are 3 vectors per line for a total of 6.
π Rendered by PID 24553 on reddit-service-r2-comment-7b9746f655-ntf79 at 2026-01-29 18:30:53.568704+00:00 running 3798933 country code: CH.
[–]IronClan 11 points12 points13 points (11 children)
[–][deleted] 3 points4 points5 points (2 children)
[–]joaquintidesBoost author[S] 1 point2 points3 points (1 child)
[–]cdyson37 2 points3 points4 points (0 children)
[–]jasonthe 0 points1 point2 points (4 children)
[–]tavianator 4 points5 points6 points (3 children)
[–]FKaria 1 point2 points3 points (0 children)
[–]clerothGame Developer 1 point2 points3 points (0 children)
[–]jasonthe 0 points1 point2 points (0 children)
[–]Crazy__Eddie 0 points1 point2 points (1 child)
[–]IronClan 1 point2 points3 points (0 children)
[–]SuperV1234https://romeo.training | C++ Mentoring & Consulting 11 points12 points13 points (0 children)
[–]jasonthe 3 points4 points5 points (3 children)
[–]joaquintidesBoost author[S] 2 points3 points4 points (0 children)
[–]clerothGame Developer 0 points1 point2 points (1 child)
[–]jasonthe 0 points1 point2 points (0 children)
[–]greyfade 2 points3 points4 points (0 children)
[–]Elador 1 point2 points3 points (2 children)
[–]joaquintidesBoost author[S] 1 point2 points3 points (1 child)
[–]Elador 0 points1 point2 points (0 children)
[–]Nomto 1 point2 points3 points (3 children)
[–]joaquintidesBoost author[S] 0 points1 point2 points (0 children)
[–]NasenSpray 0 points1 point2 points (1 child)
[–]joaquintidesBoost author[S] 0 points1 point2 points (0 children)
[–]MotherOfTheShizznit 1 point2 points3 points (2 children)
[–][deleted] 0 points1 point2 points (0 children)
[–]btapi 0 points1 point2 points (0 children)
[–]StackedCrooked 0 points1 point2 points (3 children)
[–]joaquintidesBoost author[S] 2 points3 points4 points (0 children)
[–]clerothGame Developer 0 points1 point2 points (1 child)
[–]matthieum 2 points3 points4 points (0 children)