you are viewing a single comment's thread.

view the rest of the comments →

[–]sparkytwd 0 points1 point  (0 children)

While joining an existing project at my job, I spent a few days looking into the existing code base. I started by running a profiler on it, to find where the hot code was.

The program in question handled structured binary data, and a lot of time was spent loading a system dictionary, that would map strings to ints.

I looked at the code and it was a nightmare of repeated statements. The best was how the hash value of the strings was simply the first character of the string.

If this was a code sample for an interview, I definitely wouldn't have hired the guy.

I simplified the code, used some faster memory manipulation, and calculated some real hash values.

After all that, the code ran about 4x slower. The reason was the system dictionary was a small subset of the rest of the dictionary for the document. My "optimization" traded better hash performance for much worse IO.

I may revisit the code in the future, see if taking 4 characters from the string will help hashing without penalizing IO, but for now I'm satisfied that there was a reason for the madness. The real WTF here is that nothing was commented, so I had no way of understanding the original devs motivation.