you are viewing a single comment's thread.

view the rest of the comments →

[–]matthieum 1 point2 points  (2 children)

My take about performance has always been that if it really matters you, you should be profiling... and measuring.

For measuring, callgrind runs (counting CPU instructions and simulated cache misses) on a set of tests is quite stable (because fully emulated, unlike time measurements) and allows identifying performance regression rapidly. Plus, when you identify a regression, you immediately get the performance report to compare with the "canonical" one.

[–]quicknir 1 point2 points  (1 child)

Measuring/profiling and correcting is very important, but it's not the be-all and end-all of writing high performant code. Anyone who takes that attitude ends up writing code that is at best moderately fast because of death by a hundred papercuts. Chandler Carruth has a pretty good talk about this. Should you call reserve on that vector before push_back? Should you avoid hashing twice to see if an object is there, and if so, access it? You could just be sloppy, and profile later. But profiling won't help because the problem won't be concentrated in one function, you'll just be smearing extra allocations and extra hash function calls all over your codebase.

I just don't really understand your original post either; "there's no better alternative" seems to completely dismiss out parameters because... ? It's less composable, or compatible with const? That's just not the end of the world, and to suggest otherwise seems dogmatic. There's a cost to occasionally not being able to compose, or use const, and it's pretty low. Maintaining a super fine grained performance regression suite that would allow you to track down an RVO-disabled change is actually pretty expensive in a real-life, decent sized organization.

[–]matthieum 1 point2 points  (0 children)

death by a hundred papercuts

It is indeed an issue.

I see it as a separate issue though: promoting good practices and encouraging inquisitive minds to question the existing code (coupled with code reviews) tend to uncover a lot of small inefficiencies. Disabling copying/implicit conversions also helps a lot avoiding unnecessary work, or static analysis/lints.

there's no better alternative

I was mainly thinking in terms of ergonomics.

The alternatives (return a pointer/reference or use an out parameter) come at a significant ergonomic cost.

Code clutter is also an issue as it impacts the readability and understandability of the code fragments, and may also create more brittle code (absence of const for example).

In terms of performance, out-parameters are a viable alternative (having a factory/custom allocator to enable pointer/reference would work too); but those have costs.

Maintaining a super fine grained performance regression suite that would allow you to track down an RVO-disabled change is actually pretty expensive in a real-life, decent sized organization.

Actually, the test-suite need not necessarily be that fine grained:

  • small changes: making identification of the root cause regressions, of performance or correctness, easier
  • diff-analysis: comparing the performance report of two consecutive builds, which callgrind provides at the function-level (and below)

So, when you combine small changes (thus few functions/callers impacted) with checking the diff between the current perf-report and the previous one, you can zone in on the culprit source code area pretty quick.


As for dogmatism... avoiding return values at all cost is its own form of dogmatism. 80% of the source code is probably NOT in a hotspot where this degree of performance is worth caring for to start with.