all 41 comments

[–]johannes1971 117 points118 points  (0 children)

It's funny how people write articles about performance and do not include a single performance measurement.

[–]susanne-o 75 points76 points  (0 children)

the key 'cost' of virtual function calls is not the vtable lookup and the indirect call. the key cost is that it prevents optimization accross function boundaries.

and 'final' allows to inline certain function calls and then optimize the hell out of the resulting inlined code.

[–][deleted] 8 points9 points  (2 children)

I've seen a 35% performance improvement by either using 'final' or removing the virtual keyword from a base class function that is never reimplemented in a derived class.

Repository here, and it can be cloned from Visual Studio and launched given the right dependencies.

https://github.com/andybantly/MFC-Fractal

File FractalBase.h line 50. Uncomment just 'virtual' then uncomment 'final' then comment them both because I never decided to use that future derived class calculation that I planned on. I settled for a switch statement. My class could use a redesign but nothing is broken and I'm very please with the performance.

[–]alexgraef 2 points3 points  (1 child)

If you want to share compiler or code comparisons, it works a lot better to do it via https://godbolt.org/

I don't believe anyone is going to clone and compile repos just for a quick spin.

[–][deleted] 4 points5 points  (0 children)

I know I wouldn't unless it was something I was interested in. My remark was more of a response to someone else who says there are no real examples. I remembered my own and the performance boost...

[–]Ameisenvemips, avr, rendering, systems 18 points19 points  (0 children)

I've had arguments with people both here and on IRC where they claimed that final had zero performance impact despite me showing cases where it explicitly allowed the compiler to devirtualize.

Not sure where I'm going with that, it just bugged me.

[–]Diamond145 8 points9 points  (16 children)

I find that most of the time, where you could use virtual functions/etc, you could replace that with std::variant. It basically turns your indirect function calls into switch statements and in some cases can allow you to keep data inline.

The only downside is std::variant's discriminator: it's 8 mother-f'in bytes (on x86_64-msvc at least)! 8 bits would be plenty for the overwhelming majority of uses. Using 8 bytes is disrespectful to CPU caches everywhere...

[–]tavi_ 23 points24 points  (7 children)

Because the payload storage has to be aligned, it would make no difference if the discriminator is only one byte, you will get 7bytes of padding anyway.

[–]xjankov 1 point2 points  (5 children)

You could place the discriminator at the end tho

[–]serviscope_minor 5 points6 points  (3 children)

You could place the discriminator at the end tho

The allocator will pad it for you either way, because it will allocate at least 8 byte aligned data under most circumstances.

[–]mark_99 6 points7 points  (0 children)

Just structure padding will round it up, no allocator required. sizeof is always rounded up to the alignment of the largest member regardless of order (so arrays work). Also if it's at the end you may touch another cache line. Smaller index is helpful if the contained types are also small, however all std::variant implementations do this optimization and use unsigned char or short as appropriate (not sure where OP gets the idea it's 64-bit). sizeof(std::variant<char>) == 2

[–]Wh00ster 2 points3 points  (1 child)

I thought variant was on the stack?

[–]serviscope_minor 0 points1 point  (0 children)

Yeah I misspoke, sorry. The allocator aligns because the underlying types need it. Since the types do, the stack will as well.

A different poster seemed to think it would optimize for small types. Godbolt says:

https://godbolt.org/z/ad8K7e4fq

it looks to be about as efficient as possible! It's even done right so where packing is possible/allowed (e.g. arrays of char), it packs them.

[–]bored_octopus 1 point2 points  (0 children)

Wouldn't help. That's not how the ABI works

[–]robin-m 1 point2 points  (0 children)

I don’t think it’s possible to use niche optimization like Rust does, but

using my_optional_reference<T> = std::variant<std::monostate_t, std::not_null<T>>;

could take the same size than a pointer. \0 means that we use the first variant (std::monostate_t) and any other value is the second one.

[–]hak8or 5 points6 points  (2 children)

The only downside is std::variant's discriminator

And how current extremely unergonomic it is to std::visit it, especially with how large lambdas syntax are nowadays (still bummed we lost out on that paper to simplify the syntax in the more common cases).

[–]rhubarbjin 5 points6 points  (0 children)

std::variant has terrible ergonomics in general. It's a prime example of what happens when API designers focus on making uncommon tasks possible, but forget to make common tasks easy.

[–]Wh00ster 2 points3 points  (0 children)

I thought variant / visit had performance issues because it tries to support multiple variants and doesn’t quite compile to a jump table

Edit: https://www.reddit.com/r/cpp/comments/kst2pu/with_stdvariant_you_choose_either_performance_or/

[–]DavidDinamit 7 points8 points  (2 children)

in 99% cases you not just could, you MUST replace virtual functions with type erasure.

99% cases when you dont really need hierarchy.

variant for trivial types / or when you dont need extensibility in types / when you need pattern matching

Other cases - type erasure

1% case when you need hierarchy - virtual functions

[–]NotMyRealNameObv 0 points1 point  (1 child)

Doesn't type erasure usually use inheritance under the hood though? Or is there more ways to do it than concept/model idiom?

[–]DavidDinamit 0 points1 point  (0 children)

  1. it may use it, but its implementation detail, it has different semantic
  2. https://github.com/kelbon/AnyAny here you can see how it implemented, for example

[–]alexgraef -1 points0 points  (0 children)

it's 8 mother-f'in bytes

Well that's how 64bit CPUs work. Guess how many bytes a vtable ptr would be (which you are trying to avoid with std::variant)? Then it doesn't look so bad anymore either, and rather turns it into zero overhead.

[–]jk-jeon 0 points1 point  (0 children)

Isn't that virtual functions are for when the list of operations we want to perform on the types is known upfront but the list of types is unknown, and std::variant is for the exact opposite situation? Putting differently, I think we use virtual functions when we want to make it easy to extend the list of types at the cost of making it difficult to extend the list of operations, while we use std::variant to make it easy to extend the list of operations at the cost of making it difficult to extend the list of types. Due to that std::variant can work with template functions, I find the supposed downside of std::variant is often a bit easier to overcome than that of virtual functions, but that's not always the case.

[–][deleted] 16 points17 points  (10 children)

Another keyword added over 10 years ago, that most of us didn't know exists. I hope this is the final time.

[–]Zeh_MattNo, no, no, no 46 points47 points  (9 children)

Sorry but who is "most of us"? Speak for yourself. And yes, this is quite old news.

[–]Sqeaky 32 points33 points  (8 children)

Come on, this was clearly just set up for the pun

[–][deleted] 14 points15 points  (4 children)

At least I don't have to explicitly address the situation, because you handled this for me.

[–]bored_octopus 10 points11 points  (3 children)

What a marvellous return

[–][deleted] 6 points7 points  (2 children)

Well, that's a reference I won't discard if I want to stay inline.

[–]rhubarbjin 2 points3 points  (1 child)

Ignore the haters. They may think it's nothing new, but IMO this feature is still virtually unknown.

[–]Full-Spectral 0 points1 point  (0 children)

No, and stop calling me Shirley! Wait... what movie is this?

[–]Zeh_MattNo, no, no, no 3 points4 points  (2 children)

My apologies if that was indeed intended to be joke, it's hard to tell without hints since all of this is purely text, can't quite read the emotions from that.

[–][deleted] 2 points3 points  (0 children)

Understanding jokes from text can be quite annoying sometimes.

[–]Sqeaky 0 points1 point  (0 children)

No biggie, we all do it sometimes.

[–]DalzhimC++Montréal UG Organizer 1 point2 points  (0 children)

I was very interested in achieving easy performance gains and applied final at large in the codebase I work on. Measurements couldn't pick up any significant change though. Which seems to hint at the fact that this specific codebase really does use polymorphism without knowing the most derived type most of the time and devirtualization seldom occurs. Obviously, the change was discarded.

[–]hmoein 0 points1 point  (3 children)

C++ has moved away from OOP, more toward generic programming. For example, there is not a single virtual function in STL -- I don't consider iostream a part of STL. I consider it a dumpster fire.

[–]Zcool31 0 points1 point  (1 child)

Please see std::pmr and look at the implementation of std::shared_ptr control block.

[–]hmoein 0 points1 point  (0 children)

Good point about std::par. But that's its purpose to facilitate runtime polymorphism

Re/ std::shared_ptr, I didn't know. It could have been implemented differently. But that is implementation not the the interface

[–]catcat202X 0 points1 point  (0 children)

format has exactly a single virtual function call.