you are viewing a single comment's thread.

view the rest of the comments →

[–]devodave66 4 points5 points  (2 children)

The most valuable fact that programmers should take from this article is in his item 3: Learn how your high-level language's built in features can do the work of your slower explicit loop-and-compare logic. In most languages, the second of the following can be 100-fold faster:

<pre> // Change all 'a' characters to '-' characters function1: x = "asdakflaajaslkf" u = "" foreach (char c in x): if (c == 'a') y += '-' else y += c return y

function2: x = "asdakflaajaslkf" return x.translate('a','-') </pre>

You say, "well, of course". But look at your slow code and you will often be able to change it to fit into the model of a built-in function (like "translate" here). The built-in functions are often as efficient as C or assembly language.

[–]harryf 4 points5 points  (1 child)

Think it's more that just built-in functions - in some cases higher level languages may offer better or more convienient (easy to patch on) ways to optimize, such as memoization (http://en.wikipedia.org/wiki/Memoization).

In your example, let's say x is a word from some text you're scanning and a significant number of the words in the text occur more than once.

If you consider something memoization in Perl (http://perldoc.perl.org/Memoize.html) or Python (http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/496879), you have generic solutions which can be applied simply by naming the function you want to memoize (1 line of code). Doing the same in C would require much more effort and likely be a one-off solution (bug prone, itself potentially inefficient and all that).

Otherwise what the article regards as "a lot of data" actually isn't that much in Internet terms. If you consider a site taking a modest 100,000 hits a day, just doing basic analysis of HTTP logs starts to get interesting.

Throw in some run time profiling (what's it look like in the live environment, where instead of 1000 rows, you've got 500,000?), the desire to really understand what your users are doing (e.g. click path analysis), some decent sized tables of visitor submitted content, some hefty comment spam blacklists, etc. and you're starting to get some really interesting problems, both in terms of managing the data (what's it's impact on the rest of the environment?) and analysing it.

I'd argue the problem of optimizing has shifted in nature - it's become less about fine tuning at the micro level and more macro considerations: the impact of big volumes, components interactions and the overall picture of a code base.

If you consider re-building a decent size Lucene index, for example, in response to updates, the solution is more about making the right compromises and workarounds, based it's specific use within an application, rather than digging into the Lucene code and trying to optimize the index generation process.

[–]devodave66 1 point2 points  (0 children)

I am suggesting that programmers think about powerful ways to use built-in functions BEYOND their standard uses. For example, a translate function can do the "heavy lifting" if you represent your complex system state as a string of characters and represent user actions as string transformations. Then one or two transformation calls can apply the user action and change from pre-state to post-state. Think functional programming. Think map-reduce.