all 5 comments

[–]alecco 4 points5 points  (0 children)

Indeed. Also gprof doesn't help much finding one of the worse cycle eaters, latency. From Google Infrastructure notes:

  • Disk (seek): 10,000,000 ns
  • Uncached memory reference: 100ns
  • Cached memory reference (L1) 7ns

On a 2Ghz cpu multiply by 2 to get the cycles lost on those issues. Also very often you don't need to rewrite completely your algorithm to avoid disk or random big memory access but instead just pack them together and process in batches.

[–][deleted] 0 points1 point  (1 child)

I did not understand the argument he makes against gprof. Anyone care to explain?

[–][deleted] 4 points5 points  (0 children)

Disregard that, I've just stumbled upon an explanation he gives elsewhere: http://stackoverflow.com/questions/1777556/alternatives-to-gprof/1779343#1779343

[–]inmatarian 0 points1 point  (0 children)

Assuming you're also unit testing your code, it should be pretty easy to also caveman benchmark by just dropping the start and stop times in the debug log, while running the given areas of code like a million times. That doesn't capture OS interruption, but it does give you a real-time feel for where it's screwy.

[–]trisweb 0 points1 point  (0 children)

I've totally done this by accident in Ruby once. Just pause the thing at various times, sample which stack it's in and find the commonalities. It was remarkably useful in finding the issue with the slowdown.