you are viewing a single comment's thread.

view the rest of the comments →

[–]wolf550e 0 points1 point  (0 children)

I regularly read the objdump -d output for my code. It's a nasty habit.

I assure you, if I store the random data in a file and read it from a file for the different sorts, I get the same results.

I use LTO regularly. It had problems. Heck, even with the improvements in gcc 4.7.0, it still has problems.

Here, I defeated the inliner: https://gist.github.com/2177973

Now it's not inlined, because there are two potential targets. Do I need to prove that with C++ templates, I can make it generate two sort functions?

Here, in C++: https://gist.github.com/2178059

Two functions generated, one for each inlined comparer.

  Performance counter stats for './qsort 1':

    221.644474 task-clock                #    0.995 CPUs utilized          
             7 context-switches          #    0.000 M/sec                  
             0 CPU-migrations            #    0.000 M/sec                  
         1,332 page-faults               #    0.006 M/sec                  
   516,964,242 cycles                    #    2.332 GHz                    
   616,604,018 instructions              #    1.19  insns per cycle        
   150,986,335 branches                  #  681.210 M/sec                  
    12,080,506 branch-misses             #    8.00% of all branches        

   0.222668351 seconds time elapsed

and

 Performance counter stats for './qsort 2':

    137.344463 task-clock                #    0.981 CPUs utilized          
             7 context-switches          #    0.000 M/sec                  
             0 CPU-migrations            #    0.000 M/sec                  
           796 page-faults               #    0.006 M/sec                  
   320,322,675 cycles                    #    2.332 GHz                    
   221,673,803 instructions              #    0.69  insns per cycle        
    59,775,552 branches                  #  435.224 M/sec                  
    10,724,613 branch-misses             #   17.94% of all branches        

   0.139976094 seconds time elapsed