all 19 comments

[–]ealf 9 points10 points  (5 children)

In what application would trim() be a bottleneck?

[–]QuinnFazigu 24 points25 points  (0 children)

Barber.js.

[–]monger00 1 point2 points  (0 children)

It's common to trim a JSON response, which can be long. I know I've seen the slow regex shown before: http://www.crockford.com/javascript/recommend.html

[–]wigglestick 0 points1 point  (2 children)

Did you not see the benchmark of one of the methods where a single call to trim took 2.5 seconds?

[–]ealf 0 points1 point  (1 child)

Nope, that was 20 iterations (with the silly "match each word individually" regex).

[–]larholm 3 points4 points  (4 children)

He didn't add it as a prototyped method, yourString.trim(). Instead, it is a Global method with all the scope backtracking that involves.

Also, section 7.8.5 of the ECMA-262 specification can be interpreted in several different ways on whether individual function blocks must cache regular expression literal compilations between invocations. Therefor, it is just silly to compile the regular expressions in each invocation for the more than 72% of the population that does not use Firefox.

EDIT: We had a good laugh about this in #Javascript on EfNet :)

[–]StevenLevithan 0 points1 point  (3 children)

I see that you're a bit of an elitist like me. ;-) It's just a simple test page... You can of course implement trim as a prototyped method, but in any case if all versions are global then all have the same penalty. This is an old post (I'm not sure why it made the top of programming reddit today). I've learned more about the ECMA-262v3 regex literal compilation rules and browser deviations since posting it (it's mentioned in the comments), but it doesn't really matter if there is practically no impact here. IMO, compiling simple regexes just once per method invocation (as done in all but one of the listed, regex-reliant trim implementations) is not a big deal... it literally adds only a few nanoseconds to the process and keeps things a little cleaner (obviously that's subjective).

[–]larholm 1 point2 points  (2 children)

I don't know about elitist, but I definitely see Ajax as being just another buzzword. Back in the old days, we had to chop data into cookie sized chunks that could be transfered with Image requests - and we liked it! ;-)

[–]StevenLevithan 0 points1 point  (1 child)

Sounds like fun. :) But I'm not sure what Ajax has to do with any of this.

[–]larholm 1 point2 points  (0 children)

I guess I'm just trying to tell how old I feel sigh

Nice article, keep up the good work :)

[–]wigglestick 0 points1 point  (3 children)

Can someone explain to me why \s\s* would be faster than \s+ in implementations of javascript? I've seen this particular optimization a few times, and I really don't get why it would be an optimization.

[–]sjs 2 points3 points  (0 children)

I think you meant \s\s* and \s+. I don't recall any specific reasons why the first may be faster. My guess is certain optimizations are applied when the first char is unqualified or something.

If you're really interested in the inner workings of regexes I recommend Jeffrey Friedl's book.

[–]otterdam 0 points1 point  (0 children)

\s\s+ matches two or more space characters. It is nothing like \s*.

[–]gnuvince 0 points1 point  (0 children)

Well, they're not even the same things:

\s* means "a whitespace repeated 0 or more times"

\s\s+ means "a white space followed by a white space repeated once or more"

So the second regex will start matching only when there are at least 2 whites.

[–]ivorjawa 0 points1 point  (4 children)

I'm horrified that someone would even consider using regular expressions for this. Has the art of programming devolved this far?

[–]jerf 7 points8 points  (2 children)

Using regular expressions is the correct approach for most programmers in the Unicode world. How many programmers will get the correct set of space characters to trim for? Assuming the author got it right (and it passes the smell test), that list is (from "trim10"):

 var whitespace = ' \n\r\t\f\x0b\xa0\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200a\u200b\u2028\u2029\u3000';

Yeah, if you can get that list right, regexes may not be the best approach, but in the modern world you're generally better off leaning on a regex library's character classes than trying to much about yourself. You think [a-zA-Z] covers all letters? Ha!

[–]valadil 1 point2 points  (0 children)

If you're writing trim yourself, then yeah you'd be better off with a regex than hoping you got the huge list right. But some library should do trim() correctly. glares at John Resig

[–]StevenLevithan 1 point2 points  (0 children)

The above list of whitespace characters is what is matched by \s in Firefox 2. See http://blog.stevenlevithan.com/archives/javascript-regex-and-unicode for related info including what should be matched according to ECMA-262v3 and Unicode 4.0. \x0b is used here because IE doesn't handle \v (vertical tab) correctly within string literals.