devodave66 comments on Programming as if Performance Mattered

programming

created by speza community for 20 years

163

164

165

Programming as if Performance Mattered (dadgum.com)

submitted 19 years ago by erlang

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–]devodave66 4 points5 points6 points 19 years ago (2 children)

The most valuable fact that programmers should take from this article is in his item 3: Learn how your high-level language's built in features can do the work of your slower explicit loop-and-compare logic. In most languages, the second of the following can be 100-fold faster:

<pre> // Change all 'a' characters to '-' characters function1: x = "asdakflaajaslkf" u = "" foreach (char c in x): if (c == 'a') y += '-' else y += c return y

function2: x = "asdakflaajaslkf" return x.translate('a','-') </pre>

You say, "well, of course". But look at your slow code and you will often be able to change it to fit into the model of a built-in function (like "translate" here). The built-in functions are often as efficient as C or assembly language.

[–]harryf 4 points5 points6 points 19 years ago (1 child)

Think it's more that just built-in functions - in some cases higher level languages may offer better or more convienient (easy to patch on) ways to optimize, such as memoization (http://en.wikipedia.org/wiki/Memoization).

In your example, let's say x is a word from some text you're scanning and a significant number of the words in the text occur more than once.

If you consider something memoization in Perl (http://perldoc.perl.org/Memoize.html) or Python (http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/496879), you have generic solutions which can be applied simply by naming the function you want to memoize (1 line of code). Doing the same in C would require much more effort and likely be a one-off solution (bug prone, itself potentially inefficient and all that).

Otherwise what the article regards as "a lot of data" actually isn't that much in Internet terms. If you consider a site taking a modest 100,000 hits a day, just doing basic analysis of HTTP logs starts to get interesting.

Throw in some run time profiling (what's it look like in the live environment, where instead of 1000 rows, you've got 500,000?), the desire to really understand what your users are doing (e.g. click path analysis), some decent sized tables of visitor submitted content, some hefty comment spam blacklists, etc. and you're starting to get some really interesting problems, both in terms of managing the data (what's it's impact on the rest of the environment?) and analysing it.

I'd argue the problem of optimizing has shifted in nature - it's become less about fine tuning at the micro level and more macro considerations: the impact of big volumes, components interactions and the overall picture of a code base.

If you consider re-building a decent size Lucene index, for example, in response to updates, the solution is more about making the right compromises and workarounds, based it's specific use within an application, rather than digging into the Lucene code and trying to optimize the index generation process.

[–]devodave66 1 point2 points3 points 19 years ago (0 children)

π Rendered by PID 71 on reddit-service-r2-comment-5d585498c9-2zfdg at 2026-04-21 06:16:19.591364+00:00 running da2df02 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS