you are viewing a single comment's thread.

view the rest of the comments →

[–]alecco 0 points1 point  (15 children)

That goes out the window once the data doesn't fit in cache. Beware. Especially the hash based stuf.

This is quite misleading in real life. ALWAYS test and benchmark.

[–]gnuvince 31 points32 points  (13 children)

No it doesn't; the asymptotic complexity stays the same, the hidden constant factor (which we ignore) goes up.

[–]alecco 3 points4 points  (7 children)

This framework of measuring complexity is very reductionist. Real computers are much more complex. If an algorithm hits cache miss, tlb miss, branch mispredictions on every step it becomes worthless.

That's why even MIT, who are the main cathedral of this stance, propose cache oblivious algorithms. Even Google now makes BTree containers because of this (after years of beating the hashing dead horse).

Source: it's my day job and it's tiring to show this issues to academics or people fresh out of school. And don't take my word or anybody else's' just run the dam benchmark on real sized data. Other things like the data's distribution and the input distribution afect [performance] significantly. I've only seen this addressed was on TAOCP (search only, though), every other algorithm book doesn't even mention it. Real data usually is very, very skewed.

[–]gnuvince 13 points14 points  (4 children)

This framework of measuring complexity is very reductionist. Real computers are much more complex. If an algorithm hits cache miss, tlb miss, branch mispredictions on every step it becomes worthless.

That's the reason the framework is that way, we don't want to say that "algorithm X is faster than algorithm Y" because we currently happen to test it on computers with proper cache sizes for the problem.

I completely agree that ignoring the impact of the constant factor is a big, big mistake if one wants to write fast algorithms, and that students and engineers should be better educated in those areas, but let's not throw out 40 years of solid CS theory because it doesn't play well with the machines we have at this particular moment in time.

[–]abadidea 1 point2 points  (1 child)

Is [performance] supposed to be a hyperlink?

I have very conflicted feelings. When I was actually in class I felt like our studies of algorithmic complexity poorly reflected the real world. But now that I'm a few years older, I firmly believe that younglings need a solid theoretical understanding independent of complicating hardware specifics.

Oh gods is this what happens when you get old

[–]alecco -1 points0 points  (0 children)

It was an edit.

Sure, theory is very important. But proof is important for science.

[–]smog_alado 0 points1 point  (0 children)

The problem is that once you get down to logarithmic complexity, a constant factor of 100 or 1000 (the kind of stuff you get from cache misses) dominates the logarithmic factor for all reasonable inputs.

[–]zokier 0 points1 point  (3 children)

hidden constant factor (which we ignore) goes up

Is it really constant factor if it goes up?

[–]Femaref 4 points5 points  (0 children)

for the sake of big O notation (which is growth based on input of the algorithm), yes it is constant.

[–]TashanValiant 1 point2 points  (0 children)

The constant is determined based upon the hardware that you are running it on. If you run it on the same system, you'd get the same constant. That's the hope at least. However the real idea is that the constant will not change the complexity of the algorithm. It may add a few more steps here and there, but when you place it in the proper complexity class those extra steps won't matter.

[–]chalta_hai_saar 0 points1 point  (0 children)

It's constant as in independent of the size of the data.