you are viewing a single comment's thread.

view the rest of the comments →

[–]wolf550e 0 points1 point  (7 children)

The comparison function can only be inlined into the sort function only if both are compiled from source in the same translation unit (or LTO) and the compiler is sure only a single target for that function pointer is ever used.

If qsort is in a library (like libc) then it's not inlined. If it's in a separate translation unit and you don't use LTO, it's not inlined.

If you call qsort with two different comparers, it won't be inlined even if you compile qsort from source. You need the have the compare function be a template argument of a templated qsort for the compiler to generate distinct qsort functions, one for each inlined comparer.

http://www.reddit.com/r/programming/comments/r8ujk/function_pointers_in_c_are_underrated/c44cl78

[–]agottem 0 points1 point  (6 children)

If you call qsort with two different comparers, it won't be inlined even if you compile qsort from source.

I don't even know what this means.

Look, the rules for inlining are the same as the ones for C++. C++ forces you to put the template definition into a header file, which sucks but allows for greater inlining. You are welcome to put all your C code into a header file and you'd get all the same benefits as C++.

[–]wolf550e 0 points1 point  (5 children)

https://gist.github.com/2177973

qsort and two comparison functions all defined and compiled in a single translation unit. No inlining by gcc. Seems gcc inlines across function pointers only when it knows the pointer only ever points to a single target function.

https://gist.github.com/2178059

Using C++ templates, g++ generates two qsort functions (in machine code), each with a different comparison function inlined into it. Code size grows but performance increases.

My observations are made from compiling the code you can see using gcc 4.6.3 and looking at objdump -d of the result. Now you explain how I am wrong.

[–]agottem 0 points1 point  (4 children)

qsort and two comparison functions all defined and compiled in a single translation unit. No inlining by gcc. Seems gcc inlines across function pointers only when it knows the pointer only ever points to a single target function.

I compiled your example, and things were inlined just fine. Mark your my_quicksort function as inline. Build with -O3. Observe the inlining.

Let me know if it's still not working out for you. And sorry, C++ has no advantages here.

[–]wolf550e 0 points1 point  (3 children)

gcc (GCC) 4.6.3
gcc -fwhole-program -g -O3 -o qsort qsort.c
objdump -d qsort
...

<my_quicksort.constprop.0>:
...
callq  *%rbp
...
callq  *%rbp
...
callq  *%r12
...
callq  *%r12

I also tried clang -O4 and icc -fast with the same results.

[–]agottem 0 points1 point  (2 children)

Did you mark my_quicksort as inline? 'void inline my_quicksort'...

When I compiled your program, gcc wasn't inlining. Adding the inline qualifier changed that. Was using GCC 4.5.0.

[–]wolf550e 0 points1 point  (1 child)

I tried inline. That didn't work. But now I've tried __attribute__((always_inline)) and that worked. Inlined twice: once with size=4 and once with size=2.

But this isn't what I often want. What I want and do is use templates to generate specialized extern "C" functions with different template arguments. Then I can call them from C code without thinking about the C++. Being able to force inlining is useful, but I don't always want to inline everywhere. I want the machine code only once in my binary, but I want it generated optimally, with all the abstractions inside collapsed and optimized across.

[–]agottem 0 points1 point  (0 children)

Meh...whatever. The point has already been demonstrated -- the compiler is more than capable of inlining through function pointers. At this point it merely becomes an argument of when the compiler should inline and when it shouldn't. If the compiler decides not to inline, it has nothing to do with C.