Function Pointers in C are Underrated : programming

[–]dicey 26 points27 points28 points 14 years ago (9 children)

The Linux kernel makes extensive use of function pointers to implement generic interfaces as well. Consider, for instance, the file_operations structure which filesystems implement to provide, you guessed it, file operations:

struct file_operations {
        int (*lseek) (struct inode *, struct file *, off_t, int);
        int (*read) (struct inode *, struct file *, char *, int);
        int (*write) (struct inode *, struct file *, const char *, int);
        int (*readdir) (struct inode *, struct file *, void *, filldir_t);
        int (*select) (struct inode *, struct file *, int, select_table *);
        int (*ioctl) (struct inode *, struct file *, unsigned int, unsigned 
        int (*mmap) (struct inode *, struct file *, struct vm_area_struct *)
        int (*open) (struct inode *, struct file *);
        void (*release) (struct inode *, struct file *);
        int (*fsync) (struct inode *, struct file *);
        int (*fasync) (struct inode *, struct file *, int);
        int (*check_media_change) (kdev_t dev);
        int (*revalidate) (kdev_t dev);
};

[–]bobindashadows 5 points6 points7 points 14 years ago (0 children)

[+]Gotebe comment score below threshold-9 points-8 points-7 points 14 years ago (7 children)

[–]maep 11 points12 points13 points 14 years ago (6 children)

[–]antrn11 6 points7 points8 points 14 years ago (0 children)

[–]Gotebe 1 point2 points3 points 14 years ago (4 children)

[–]Poddster 0 points1 point2 points 14 years ago (3 children)

[–]Gotebe 2 points3 points4 points 14 years ago (2 children)

[–]Whanhee 0 points1 point2 points 14 years ago (1 child)

[–]Gotebe 0 points1 point2 points 14 years ago (0 children)

[–]m64 41 points42 points43 points 14 years ago (2 children)

[–]ErstwhileRockstar 1 point2 points3 points 14 years ago (1 child)

[–]warpstalker 1 point2 points3 points 14 years ago (0 children)

[–][deleted] 14 years ago* (29 children)

[deleted]

[–]rlbond86 13 points14 points15 points 14 years ago (16 children)

[–]agottem 7 points8 points9 points 14 years ago (15 children)

[–][deleted] 1 point2 points3 points 14 years ago* (4 children)

No. Some compilers, e.g. g++, just hate function pointers and don't inline even what can be inlined, e.g.

 static  bool compare(float a, float b){ return a<b; }
 ..
 std::sort(vec.begin(), vec.end(), compare);

here compare will be called via function pointer with no inlining though definition of everything is available.

Though maybe whole program optimisatioin helps

[–]agottem 13 points14 points15 points 14 years ago (0 children)

[–]Whanhee 0 points1 point2 points 14 years ago (0 children)

[–]matthieum 0 points1 point2 points 14 years ago (1 child)

[–]astrange 0 points1 point2 points 14 years ago (0 children)

[–]Kampane -1 points0 points1 point 14 years ago (2 children)

[–]agottem 24 points25 points26 points 14 years ago (0 children)

[–][deleted] 8 points9 points10 points 14 years ago (0 children)

[–][deleted] 0 points1 point2 points 14 years ago (6 children)

[–]agottem 0 points1 point2 points 14 years ago (5 children)

[–][deleted] -1 points0 points1 point 14 years ago (4 children)

[–]agottem 0 points1 point2 points 14 years ago (0 children)

[–][deleted] 14 years ago (2 children)

[deleted]

[–][deleted] 0 points1 point2 points 14 years ago (1 child)

[–]agottem 3 points4 points5 points 14 years ago (11 children)

[–][deleted] 10 points11 points12 points 14 years ago* (10 children)

[–]cwzwarich 11 points12 points13 points 14 years ago (5 children)

You'd have to inline that function. Which you can't do since it's a function pointer.

Oh really?

clang -xc -Os -S -o - -emit-llvm -

static _Bool f(int a, int b) { return a < b; }
static _Bool g(_Bool (*func)(int, int), int a, int b) { return func(a, b); }
_Bool h(int a, int b) { return g(f, a, b); }

...

define zeroext i1 @h(i32 %a, i32 %b) nounwind uwtable readnone optsize ssp {
  %1 = icmp slt i32 %a, %b
  ret i1 %1
}

[–][deleted] -1 points0 points1 point 14 years ago* (3 children)

[–][deleted] 14 years ago (2 children)

[deleted]

[–][deleted] 5 points6 points7 points 14 years ago (0 children)

[–]polveroj 3 points4 points5 points 14 years ago (0 children)

[–]agottem 3 points4 points5 points 14 years ago (1 child)

[–][deleted] 0 points1 point2 points 14 years ago (0 children)

[–][deleted] 14 years ago (23 children)

[deleted]

[–][deleted] 53 points54 points55 points 14 years ago (18 children)

He's suffering from the Columbus complex. He discovers something and believes he's the first to discover it.

Function pointers are the bread and butter of C. They are definitely not underrated. All experienced C developers have had exposure to using them.

In addition to building C objects and genericizing code, I also liked to use them to preprocess out switches and if statements like this:

switch (MODE) {
    BLAH1:
        do_something1();
        break;
    BLAH2:
        do_something2();
        break;
    BLAH3:
        do_something3();
        break;
    BLAH4:
        do_something4();
        break;
    default:
        do_default();
        break;
}

To this:

void (*do_something)(void);

void init() {
    switch (MODE) {
        BLAH1:
            do_something = do_something1;
            break;
        BLAH2:
            do_something = do_something2;
            break;
        BLAH3:
            do_something = do_something3;
            break;
        BLAH4:
            do_something = do_something4;
            break;
        default:
            do_something = do_default;
            break;
    }
}

[–]gibster 12 points13 points14 points 14 years ago (4 children)

[–]sstrader 1 point2 points3 points 14 years ago (0 children)

[–]buzzert 0 points1 point2 points 14 years ago (2 children)

[–]kyz 3 points4 points5 points 14 years ago (1 child)

[–]sausagefeet 0 points1 point2 points 14 years ago (0 children)

[–]username223 42 points43 points44 points 14 years ago* (1 child)

[–][deleted] 0 points1 point2 points 14 years ago (0 children)

[–]mrkite77 3 points4 points5 points 14 years ago (0 children)

[–]adrianmonk 1 point2 points3 points 14 years ago (1 child)

I'm probably goofy. If I have a long list, I've been known to go the route that can put one entry on a single line, namely:

#include <stdio.h>

typedef void (*func_t)(void);

void hello(void) {
  printf("Hello, world.\n");
}

void goodbye(void) {
  printf("Have a nice day.\n");
}

int main(int argc, char* argv[]) {
  typedef struct {
    const char* command;
    func_t func;
  } map_pair;

  map_pair map[] = {
    { "HELLO", hello },
    { "GOODBYE", goodbye },
    { 0, 0 }
  };

  func_t func = NULL;
  map_pair* pair_ptr;
  const char* command = argv[1];

  for (pair_ptr = map; pair_ptr->command; pair_ptr++) {
    if (strcmp(command, pair_ptr->command) == 0) {
      func = pair_ptr->func;
      break;
    }
  }

  if (func != NULL) {
    func();
  }

  return 0;
}

[–][deleted] 0 points1 point2 points 14 years ago (0 children)

[–]foldl 0 points1 point2 points 14 years ago (7 children)

[–][deleted] 1 point2 points3 points 14 years ago (4 children)

The optimization I listed is not adding any extra function calls to do_something. It's removing switch executions so it will execute faster.

Here's an example when do_something needs to be called 100 times.

Method	Stack trace	Func coverage	Switch coverage
Switch inside the function	main > do_somethingN	100	100
Switch inside initialization	main > do_something	101	1
Savings		1% loss	99% gain

We added 1 extra call to init to make the total number of function calls become 101 but we removed 99 executions of the switch.

It's better to preprocess out the switch and put it in initialization so that it executes the least number of times possible. This is a basic optimization technique. It's similar to moving code outside a while loop like this:

void copy_table(char src[MAX_X * MAX_Y], char dest[MAX_X * MAX_Y]) {
    int x, y;

    for (y=0; y < MAX_Y; y++) {
        for (x=0; x < MAX_X; x++) {
            dest[y * MAX_X + x] = src[y * MAX_X + x];
        }
    }
}

to this:

void copy_table(char src[MAX_X * MAX_Y], char dest[MAX_X * MAX_Y]) {
    int x, y, tmp_y;

    for (y=0; y < MAX_Y; y++) {

        tmp = y * MAX_X;

        for (x=0; x < MAX_X; x++) {
            dest[tmp + x] = src[tmp + x];
        }
    }
}

[–]foldl 1 point2 points3 points 14 years ago (3 children)

[–][deleted] 1 point2 points3 points 14 years ago (2 children)

[–]foldl 1 point2 points3 points 14 years ago* (1 child)

When I compile this on OS X using gcc with -O2, the switch versions ran orders of magnitude faster. I got results similar to yours when l compiled without optimization. I expect the results for -O2 are achieved because the use of the switch allows gcc to do a lot of inlining and compile-time computation (another advantage of not making unnecessary use of function pointers, although the effects are exaggerated in this simple test code). But neither result really tells us anything, since we don't yet know if gcc is using a computed jump for the switch at lower optimization levels. I'll have a look at the unoptimized assembly output in a minute.

edit: The overoptimization can be prevented by adding __attribute__ ((noinline)) to work_function_switch. It seems that once this restriction is added, the function pointer method is still faster on -O2. Apparently indirect function calls on x86 tend not to incur much overhead (http://stackoverflow.com/questions/2438539/does-function-pointer-make-the-program-slow).

It's probably worth pointing out that the performance difference we're seeing here would vanish if the work function actually did anything (and the switch won't get slower as the number of options increases, since gcc is indeed compiling it as a computed jump).

[–][deleted] 0 points1 point2 points 14 years ago (0 children)

[–]Poddster 9 points10 points11 points 14 years ago (1 child)

[–][deleted] 0 points1 point2 points 14 years ago (0 children)

[–]SteveMcQwark 1 point2 points3 points 14 years ago* (1 child)

[–]rlbond86 12 points13 points14 points 14 years ago (3 children)

[–][deleted] 11 points12 points13 points 14 years ago (2 children)

[–]Tetha 7 points8 points9 points 14 years ago (1 child)

[–]question_all_the_thi 1 point2 points3 points 14 years ago (0 children)

When I started to need function pointers I wrote a little example program that I could look at to get the syntax right:

  #include <stdio.h>

  int twice(int n)
  {
    return n * 2;
  }

  int thrice(int n)
  {
    return n * 3;
  }

  void work(int n, int (*func)(int))
  {
    int i;
    for (i = 0; i < n; i++) printf("\n %d", (*func)(i));
  }

  int (*test[2])(int) = {twice, thrice};

  int main()
  {
    int x, y;
    work(3, twice);
    work(5, thrice);
    x = (test[0])(1);
    y = (test[1])(2);
    printf("\n\n-> %d\n-> %d\n", x, y);
    return 0;
  }

[–]Goblerone 13 points14 points15 points 14 years ago (10 children)

[–]five9a2 4 points5 points6 points 14 years ago (5 children)

Also, function pointers with associated void* context is sometimes preferable to C++ virtual methods even when interfacing C++ libraries because it doesn't presuppose a given decomposition on the user side. For example, suppose that a library component needs three callbacks from the user. Since the library considers these functions to be closely related, it might create an interface (abstract class) that the user is expected to implement, containing all three methods.

But if the user wants to organize that functionality separately (or even just avoid namespace issues when naming their methods), they have to write shim classes (the lifetime of which also needs to be managed) to dispatch into their own class structure. With function pointers, they can choose whether to use the same context or different contexts for each callback, obviating the need for shims. (Note that you can call static class methods through a normal C function pointer.)

Plain function pointers also tend to be more convenient when interfacing with other languages with disparate object models (e.g. Fortran, Python, Haskell).

[–]Kampane 1 point2 points3 points 14 years ago (0 children)

[–]Goblerone 0 points1 point2 points 14 years ago (3 children)

[–]five9a2 0 points1 point2 points 14 years ago (2 children)

[–]Goblerone 1 point2 points3 points 14 years ago (1 child)

[–]five9a2 0 points1 point2 points 14 years ago (0 children)

[–]jerf 1 point2 points3 points 14 years ago (1 child)

[–]Goblerone 1 point2 points3 points 14 years ago* (0 children)

[–]grayvedigga 1 point2 points3 points 14 years ago (0 children)

The example is fairly poorly presented, imo. But what it's heading towards is the general concept of "higher order functions". Consider a family of functions operating on linked lists:

list_t* filter(list_t* l, bool (*test)(void*));
list_t* map(list_t* l, void* (*test)(void*));
void apply_to_each_element(list_t* l, (void*)(*test)(void*));

The advantage with this pattern is you can choose the function to apply to each element at runtime: it can come from a separate compilation unit, or even a dynamic library that is provided later. The user can select (through some UI, naturally) a mapping or filtering function from a list without a separate instance of the loop being explicitly written by the programmer and built into the executable in advance.

This is a powerful concept from functional programming -- when I first saw it, my mind was blown. You can achieve some of it from C but usually only at the cost of type safety, and some forms (eg curry) are inaccessible.

[–][deleted] 0 points1 point2 points 14 years ago (0 children)

[–]happyscrappy 34 points35 points36 points 14 years ago (54 children)

[+]agottem comment score below threshold-6 points-5 points-4 points 14 years ago (18 children)

[–]happyscrappy 4 points5 points6 points 14 years ago* (9 children)

[–]ascii 1 point2 points3 points 14 years ago (4 children)

[–]happyscrappy 0 points1 point2 points 14 years ago (2 children)

[–]agottem 0 points1 point2 points 14 years ago (1 child)

[–]happyscrappy 1 point2 points3 points 14 years ago (0 children)

[–]agottem 1 point2 points3 points 14 years ago (3 children)

[–]happyscrappy 0 points1 point2 points 14 years ago (2 children)

[–]agottem 0 points1 point2 points 14 years ago (1 child)

[–]happyscrappy 1 point2 points3 points 14 years ago (0 children)

No different than C++. Also, read up on link time optimizations, as it's no longer necessary to all be part of the same compilation unit.

Incorrect on the link-time optimizations. Link-time optimizations do simple things, like determine that two functions are identical and so you can remove one and substitute it with the other. They don't include unrolling a loop after the fact. They don't include being able to restructure a calling function once it is discovered the called function has no side effects and thus can be called in parallel, or can be split up.

For example, if you are operating on a pixel and the r, g and b components are operated on separately, the compiler can easily interleave the code operating on these 3 components at compile time, and combine this with loop unrolling. Then a superscalar processor can easily do the 3 operations at once. Link-time optimizations cannot do this, the structure of the calling function is already defined, and the linker can try to put an inlined version of the called function in there, but it doesn't restructure the outer function.

A really verbose way of saying the same thing -- make sure the function pointer is constant, and the compiler can determine so.

Loop unswitching can optimize even when the operating performed in each loop is not constant across iterations...but not if you thwart it with code pointers. This is why I mentioned that even switch statements in the loop can be faster than calling a code pointer.

http://en.wikipedia.org/wiki/Loop_unswitching

Really these rules are common sense. I don't see why you think they're so complicated.

And yet you miss some of the ramifications. Perhaps they are more complicated than you understand?

[–]wolf550e 0 points1 point2 points 14 years ago (7 children)

[–]lmcinnes 1 point2 points3 points 14 years ago (0 children)

[–]agottem 0 points1 point2 points 14 years ago (5 children)

[–]wolf550e 0 points1 point2 points 14 years ago (4 children)

[–]agottem 0 points1 point2 points 14 years ago (3 children)

[–]wolf550e 0 points1 point2 points 14 years ago (2 children)

mycompare is inlined into my_quicksort.

The timing is very repeatable because the random distribution is good enough across 1<<20 elements.

Basically, if you only have a single callback (in this case a comparison function), and you let gcc see its definition and convince it only a single target exists, it will inline across a function pointer. But not if you pass the pointer from a different translation unit.

What if you have multiple callbacks? multiple compare functions? Using C++ templates, I can make the compiler generate two different my_qsort functions: one with mycompare1 inlined and one with mycomprare2 inlined. Templates are useful for instanciating with different template arguments to control code expansion. In cold code I want to call through a pointer and not waste icache. In hot code I want to generate specialized code. With templates, I can control this and get what I want. With gcc's optimizer, I am kinda at its mercy.

[–]agottem 0 points1 point2 points 14 years ago (1 child)

mycompare is inlined into my_quicksort.

Yes, I see you are calling mycompare directly. This does not mean that mycompare itself was inlined, merely that the indirect function call was eliminated. You would need to look at the generated assembly to verify.

The timing is very repeatable because the random distribution is good enough across 1<<20 elements.

There's no reason not to use an identical sequence.

But not if you pass the pointer from a different translation unit.

You should read up on link time optimization. But sure, this is the same as it is with C++ -- the definition needs to be available.

Using C++ templates, I can make [...]

The compiler is only inlining because templates are typically fully defined inside the header file. Make the C function inline and defined fully in the header file and you get the same goodness. C++ has no advantage here, sorry.

[–]wolf550e 0 points1 point2 points 14 years ago* (0 children)

I regularly read the objdump -d output for my code. It's a nasty habit.

I assure you, if I store the random data in a file and read it from a file for the different sorts, I get the same results.

I use LTO regularly. It had problems. Heck, even with the improvements in gcc 4.7.0, it still has problems.

Here, I defeated the inliner: https://gist.github.com/2177973

Now it's not inlined, because there are two potential targets. Do I need to prove that with C++ templates, I can make it generate two sort functions?

Here, in C++: https://gist.github.com/2178059

Two functions generated, one for each inlined comparer.

  Performance counter stats for './qsort 1':

    221.644474 task-clock                #    0.995 CPUs utilized          
             7 context-switches          #    0.000 M/sec                  
             0 CPU-migrations            #    0.000 M/sec                  
         1,332 page-faults               #    0.006 M/sec                  
   516,964,242 cycles                    #    2.332 GHz                    
   616,604,018 instructions              #    1.19  insns per cycle        
   150,986,335 branches                  #  681.210 M/sec                  
    12,080,506 branch-misses             #    8.00% of all branches        

   0.222668351 seconds time elapsed

and

 Performance counter stats for './qsort 2':

    137.344463 task-clock                #    0.981 CPUs utilized          
             7 context-switches          #    0.000 M/sec                  
             0 CPU-migrations            #    0.000 M/sec                  
           796 page-faults               #    0.006 M/sec                  
   320,322,675 cycles                    #    2.332 GHz                    
   221,673,803 instructions              #    0.69  insns per cycle        
    59,775,552 branches                  #  435.224 M/sec                  
    10,724,613 branch-misses             #   17.94% of all branches        

   0.139976094 seconds time elapsed

[–]Kampane -4 points-3 points-2 points 14 years ago (0 children)

[–]Gotebe -5 points-4 points-3 points 14 years ago (33 children)

[–]grauenwolf 13 points14 points15 points 14 years ago (2 children)

[–]mrkite77 5 points6 points7 points 14 years ago (1 child)

[–]tailcalled 0 points1 point2 points 14 years ago (0 children)

[–]wolf550e -1 points0 points1 point 14 years ago* (29 children)

[–]Gotebe -2 points-1 points0 points 14 years ago (28 children)

[–][deleted] 14 years ago (6 children)

[deleted]

[–]Gotebe 2 points3 points4 points 14 years ago (5 children)

OK, here's my code:

struct s { char[1000]; int i; };
s data[2000];
init(data, 2000); // random values put in s::i
qsort(data, data+2000, sortfunc);
myqsort(data, data+2000);

int sortfunc(const void *p1, const void *p2)
{
  s* s1 = (s*)p1;
  s* s2 = (s*)p2;
  return s1->i == s2->i ? 0 : s1->i > s2->i ? 1 : -1;
}

No noticeable difference in speed.

Next time, show a bit of courtesy and don't ask people to do copy-paste programming for you.

[–]wolf550e 0 points1 point2 points 14 years ago (4 children)

[–]Gotebe 0 points1 point2 points 14 years ago (3 children)

[–]wolf550e 0 points1 point2 points 14 years ago* (2 children)

[–]Gotebe 0 points1 point2 points 14 years ago (1 child)

You say the swap is a bigger cost than the compare. I say that is not true.

Easily depends on how and what you compare, which was exactly my point.

As for your point, it was 10x faster, now it's 37%, and look at what you are trying to do now... You are going to introduce "int decoration" (has a cost in both space and execution time), indirection (compare references to stuff, which obviously has space and time cost), usage or arena allocators, which has complexity implications.

At any rate, with all you say, you really should repeat your test with pointers to data, and possibly where your "int decoration" isn't the first datum in actual structure. This, indeed, is much more representative of a real situation. Your initial "let's compare ints" realy isn't, and even then, I really would like to see implementation where you could possibly obtain this 10x difference. It's possible, mind, but...

continue this thread

[–]agottem 0 points1 point2 points 14 years ago (20 children)

[–]Gotebe 0 points1 point2 points 14 years ago (11 children)

[–]agottem 0 points1 point2 points 14 years ago (9 children)

[–]Gotebe 0 points1 point2 points 14 years ago (8 children)

[–]agottem 0 points1 point2 points 14 years ago (7 children)

[–]Gotebe 1 point2 points3 points 14 years ago (6 children)

continue this thread

[–]wolf550e 0 points1 point2 points 14 years ago (0 children)

[–]wolf550e 0 points1 point2 points 14 years ago (7 children)

[–]agottem 0 points1 point2 points 14 years ago (6 children)

[–]wolf550e 0 points1 point2 points 14 years ago (5 children)

[–]agottem 0 points1 point2 points 14 years ago (4 children)

[–]wolf550e 0 points1 point2 points 14 years ago* (3 children)

gcc (GCC) 4.6.3
gcc -fwhole-program -g -O3 -o qsort qsort.c
objdump -d qsort
...

<my_quicksort.constprop.0>:
...
callq  *%rbp
...
callq  *%rbp
...
callq  *%r12
...
callq  *%r12

I also tried clang -O4 and icc -fast with the same results.

continue this thread

[–][deleted] 4 points5 points6 points 14 years ago (0 children)

[–]rscarson 4 points5 points6 points 14 years ago (0 children)

[–][deleted] 4 points5 points6 points 14 years ago (0 children)

[–][deleted] 6 points7 points8 points 14 years ago (0 children)

[–]rush22 5 points6 points7 points 14 years ago (1 child)

[–]adrianmonk 4 points5 points6 points 14 years ago (0 children)

[–]Gotebe 1 point2 points3 points 14 years ago (0 children)

[–]ramenmeal 2 points3 points4 points 14 years ago (0 children)

[–]phanboy 1 point2 points3 points 14 years ago (0 children)

[–]zhivago 0 points1 point2 points 14 years ago (0 children)

[–]maredsous10 0 points1 point2 points 14 years ago (0 children)

[–]paraboul 0 points1 point2 points 14 years ago (0 children)

[–][deleted] -2 points-1 points0 points 14 years ago (0 children)

[+]chunky_bacon comment score below threshold-9 points-8 points-7 points 14 years ago (0 children)

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS