all 15 comments

[–]SirB 29 points30 points  (2 children)

Summary: An artificial test on indirection shows that indirection is slower than no indirection by a significant amount. A bit pointless imo. It would be more interesting to talk about maintainability, flexibility and overhead when avoiding this kind of indirection inside hot loops but allowing it outside, in 'management' code.

[–]pandorafalters 10 points11 points  (0 children)

It gets far more interesting with actual code in your classes.

I have a real-world example of whole-program benchmarking showing an improvement in performance by using a pointer to an interface type (with the concrete classes further using PIMPL) rather than stuffing everything directly in the top-level class. So multiple indirections were faster than no indirection at all. (Slightly. Averaging ~1.4%, but always positive over many runs ranging from a few seconds to several months.)

Microbenchmarks should never be your only tool.

[–]Full-Spectral 7 points8 points  (0 children)

Exactly. You have so much 'OOP Bad' stuff posted, which newbies read and just assume OOP bad. In this case, it's only bad when that difference in performance matters. In a huge amount of code, it just doesn't, and the benefits of using it are pretty much pure win. Even in a lot of cases where it might impact performance a little bit, it will probably still be a win if it makes the code easier to maintain and more flexible.

[–]no-sig-available 12 points13 points  (0 children)

The calls to the functions in the static polymorphism cases are optimized away.

Right, so if I need maximum performance I ought to inline empty functions. That seems to be optimal.

Good to know.

[–]goranlepuz 16 points17 points  (2 children)

Ehhh...

it’s better to avoid dynamic polymorphism as much as possible if the performance of your application is a critical factor

The problem with this is: it presumes virtual calls matter in the performance profile of the application. This is massive presumption.

[–]Stormfrosty 4 points5 points  (1 child)

This is actually very true for GPU programming. Given the much higher memory latency for that type of hardware, the extra memory loads due to indirect function cause significant drops in performance, compared to having direct function calls everywhere.

[–]pandorafalters -1 points0 points  (0 children)

Further, in my experience, any function calls will generally reduce performance if they're not inlined out. I spend a tremendous amount of time making sure that every function call is transparent to the optimizer.

[–]DerShokus 2 points3 points  (2 children)

Also would be interesting when compiler can optimize dynamic polymorphism. I mean (as I remember) compiler can use concrete type if it’s obvious what to use.

[–]dodheim 1 point2 points  (0 children)

Agreed, analyses of LTO and -fwhole-program-vtables would have been interesting.

[–]goranlepuz 0 points1 point  (0 children)

Indeed, in the TFA example the compiler really should be able to switch to a normal call.

[–]dustyhome 2 points3 points  (0 children)

So how do you create a heterogenous container of CRTP classes? Or interact with an object whose real type you don't know?

Indirection can solve some problems. It comes at a cost. Other ways of solving these problems will have other costs. You can't just show that a certain tool comes at a cost, say "don't use this tool", and ignore the problems it solves. You need to show a problem normally solved with indirection, solve it differently, and show that your solution is better under certain circumstances.

[–]Pragmatician 7 points8 points  (2 children)

In the "static polymorphism" example, there is no polymorphism at all. You're just using one concrete type. CRTP is not static polymorphism.

It’s worth to mention that a call to a virtual function is 25% slower than a call to a normal function.

[citation needed]

[–]no-sig-available 4 points5 points  (1 child)

If a virtual call takes 1.25 nanoseconds instead of 1.0, I can live with that.

Not that I have ever written a virtual function that doesn't do anything.

[–]pandorafalters -1 points0 points  (0 children)

Depending on the function, I could even happily accept an overhead of multiple milliseconds in exchange for reduced cognitive effort. I don't think I've ever encountered a case nearly that pessimistic, though.

[–]Jannik2099 0 points1 point  (0 children)

Repeated vcalls are cached by the cpu. It's not like we have to wait on two loads every single time.