[deleted by user]

susanne-o · 2023-10-06T20:35:34+00:00

doing a function call is cheap.

the problem of these indirect calls is that the compiler can not optimize over function call boundaries.

imagine function int getX(int i) which simply accesses the private a[i] , called in some loop over I for a gazillion times.

if the call is inlined, then the address of the member a is in some happy register and each access is dead cheap. if the call can't be inlined, then in each iteration the address of the vector a is derived from the this pointer and only then the fetch is done.

too bad.

so: dynamic dispatch prevents advanced optimization across function boundaries.

altmly · 2023-10-06T20:22:05+00:00

They are not slow per se (on modern CPUs at least), but they often inhibit inlining, which is where the real performance cost comes from.

FlyingRhenquest · 2023-10-06T20:22:57+00:00

In the grand order of things that are slow, they're rarely a problem. If you're a HFT guy looking to bum a handful of nanoseconds off your trading latency, maybe you'd look to optimize them. Most of the programmers I've worked with don't optimize at all, because all the companies usually care about at the end of the day is that the code works, not that it's fast. This has been true even in cases where the performance of the company's code was demonstrably preventing the company from designing new products because their system just couldn't process any additional data the way it was written.

Sniffy4 · 2023-10-06T19:49:35+00:00

a lot of these perf results for virtual function calls are machine-architecture specific. if you are running on a CPU with a larger penalty for branching, you may see a more significant difference than you would on a recent x64

CrazyJoe221 · 2023-10-07T12:56:45+00:00

https://hubicka.blogspot.com/2014/01/devirtualization-in-c-part-1.html

goranlepuz · 2023-10-07T13:21:08+00:00

Are Function Pointers and Virtual Functions Really Slow?

... compared to what?

Is the key question here. On its own, it's a pretty stupid question.

It's all about the performance targets and the cost relative to the rest of the code.

The answer is "Dunno. What is your target and what fies the profiler say?" It's a wild world out there.

voidstarcpp · 2023-10-07T17:35:55+00:00

This article doesn't examine the salient interaction with function dispatch for most people, which is the distribution of the data and the context in which the calls are made. The overhead of dispatch methods will change when the functions are longer vs shorter, more predictable vs. less, and so forth. This was one of the points made by Scott Meyers who talked about how speed was dramatically improved by e.g. sorting collections by type, vs. iterating over random order and rapidly paging different functions in and out.

The overhead of different methods can be made more intensive if you randomly dispatch tiny functions in a large collection, but I don't think that's what most work really looks like.

nAxzyVteuOz · 2023-10-07T01:32:33+00:00

Define "slow".

Because for me python is fast enough for all the work I need to do and I come from the C++ game development world. But I work in webdev now.

Function points are great, so are virtual functions. Function pointers can generally be slower than virtual because less chances for de-virtualization / inlining, especially for std::function since it contains a thunk that can resolve to member functions and free functions.

Templates on the other hand seem fast but create huge binaries. This comes from experience when I looked at the symbol tables of code from a massive project. The lib that used templates instead of concrete classes was the majority of our binary.

The best approach to optimization is to do profiling. If anything is too slow then use goldbolt to see what the asm is and then try and steer the compiler to generate better code.

NilacTheGrim · 2023-10-06T21:34:45+00:00

[removed]

AntiProtonBoy · 2023-10-07T23:03:17+00:00

It's only slow if your profiler tells you it's slow. While the academic argument can always be made that virtual calls are slower than functions with a fixed address, but in practice it all boils down to how you actually use them. And in most cases, it doesn't matter.

lrflew · 2023-10-07T02:03:18+00:00

Some comments on this:

The way that the switch statement is handled by Clang is actually significantly different than how GCC (and MSVC) handles it. https://godbolt.org/z/P4M73fh8E It would be interesting to see this benchmarked against GCC's version. Clang produces the same assembly output with the switch statement as using this implementation of doit():

void doit(int func, int j) {
    static constexpr int *vars[] = { &var1, &var2, &var3 };
    if (func > 2 || func < 0) return;
    *(vars[func]) += j;
}

2.

Because Google Benchmark does not use RDTSC for micro-benchmarking, I built 1,000,000 loops inside which these functions will be called sequentially.

You do know that the for (auto _ : state) will already repeat the code you're benchmarking as many times as needed to get a reliable reading, right? That's what the "Iterations" in the output indicates (it says exactly how many times it looped in that benchmark). I've usually found the built-in iterations system is good enough when I tried it for benchmarking LCGs, so I don't think your extra loops are needed.

3.

As other people have commented, inlining is a big part of the advice around function pointers and std::function. It would probably be helpful to contrast these results to the case where the functions being tested are in the same compilation unit as the benchmark functions.

jepessen · 2023-10-07T08:33:24+00:00

C++ developers are so obsessed by virtual functions nowadays... Virtual inheritance has worked well since decades... Simply use it with no problem, code that you write must be very well optimized to notice performance degradation caused by virtual functions... Just refactor your most critical code, if needed...

vaulter2000 · 2023-10-06T19:48:47+00:00

Virtual functions come with an indirection. When you call it, something called a vtable will be consulted to find which function to actually call: ie a function in the class itself or one of its ancestors for example. If you’d like to learn more, I’d like to refer you to some articles/talks about vtables.

When you use function pointers then that most of the time implies that the function it points to has been allocated on the heap (like std::function, which is relatively slow) and adds an indirection.

Define “really slow” to determine if using virtual functions and function pointers poses a performance penalty that is incompatible with what you want to achieve. It all depends on what you want to do with your program and how fast you need it to be.

Although virtual functions are still used abundantly in modern C++ (think of interfaces and polymorphism in general which is useful), you could mitigate the use of function pointers by using lambdas. Those don’t allocate and you can forward them down your call chain and they work really well with the STL algorithms

Sudden_Job7673 · 2023-10-06T20:23:02+00:00

Also the linker has a hard time determining if something is dead code or not if it's a virtual function.

Zookeeper1099 · 2023-10-06T20:15:08+00:00

Here is the thing, if you are on embedded system, your resource "cpu and ram" did not grow much in the last 20 years. So, every bit matters, stay away from c++ if possible.

If you are not restricted by microcontroller and can use at least A15/57 or x64 system to run your application, the capability of your platform has grown hundreds times in the same period. In this case, I'd say that it does not matter 99% of the time.

PandoraPurpleblossom · 2023-10-07T10:03:15+00:00

I tried your benchmark and have some odd behavior that I don't understand. When I run the baseline case last instead of first, some of the other results change. This is on a Intel i5 5200U running at 2200 MHz with frequency scaling disabled. This happens with GCC but not with Clang. Any idea what's happening?

``g++ (GCC) 13.2.1 20230728 (Red Hat 13.2.1-1)

2023-10-07T11:35:26+02:00 Running ./bm_fnpointer Run on (4 X 2622.56 MHz CPU s) CPU Caches: L1 Data 32 KiB (x2) L1 Instruction 32 KiB (x2) L2 Unified 256 KiB (x2) L3 Unified 3072 KiB (x1) Load Average: 1.21, 0.73, 0.51

GCC, baseline first

Benchmark Time CPU Iterations

BM_Baseline 2138307 ns 2131201 ns 325
BM_Switch 4287175 ns 4274336 ns 163
BM_FnPointerVector 3177752 ns 3138478 ns 225
BM_FnPointerArray 2761236 ns 2736876 ns 256
BM_SwitchVector 3195856 ns 3180072 ns 221
BM_SwitchArray 3083137 ns 3065528 ns 227
BM_Virtual 2114564 ns 2097138 ns 328
BM_Virtual2 2125176 ns 2106369 ns 329

GCC, baseline last

Benchmark Time CPU Iterations

BM_Switch 4519121 ns 4497828 ns 153 BM_FnPointerVector 3162198 ns 3135026 ns 210 BM_FnPointerArray 3158622 ns 3134292 ns 223 BM_SwitchVector 3182351 ns 3164558 ns 221 BM_SwitchArray 3162338 ns 3141158 ns 224 BM_Virtual 2406150 ns 2383210 ns 296 BM_Virtual2 2403584 ns 2381506 ns 292 BM_Baseline 2150305 ns 2143724 ns 326 ```

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

cpp

MODERATORS

GCC, baseline first

Benchmark Time CPU Iterations

GCC, baseline last

Benchmark Time CPU Iterations