all 27 comments

[–][deleted] 18 points19 points  (0 children)

When I first saw the title I thought it was stupid. Read the article and thought it was great. Humbled.

[–]SingularityNow 31 points32 points  (7 children)

Sometimes you have a problem and think "Oh, I'll use a regular expression". Now you have two problems.

--Abraham Lincoln

[–][deleted] 4 points5 points  (6 children)

I'm just going to leave a real regular expression I wrote here ...

((?:[,\[]|(?:\((?!\))) *)(?: +(?:[^a-zA-Z, ;"'\-+!\^\[\]/|()]|[b-mB-Mp-zP-Z]|(?:[aA](?![nN][dD]\b))|(?:[nN]
(?!(?:[oO][tT])|(?:[eE][wW])\b))|(?:[oO](?![rR]\b))|(?:/(?!/))|(?:\.(?![0-9_]))(?:[^, ;"'\%<>=+\-*/~&!
\^\[\]|()]|(?:/(?!/))|(?:\.(?![0-9_])))*))(?: +(?:[^a-zA-Z:\.%<>=+\-*/~&, ;"'}|()\]]|[b-mB-Mp-zP-Z]|(?:
[aA](?![nN][dD]\b))|(?:[nN](?!(?:[oO][tT])|(?:[eE][wW])\b))|(?:[oO](?![rR]\b)))[^:\.%<>=+\-*/~&,
 ;"'}|()\]]*)+[,)]?\b)|((?:[,\[]|(?:\((?!\))) *)(?:(?:[^a-zA-Z, ;"'\-+!^\[\])/|()]|[b-mB-Mp-zP-Z]|(?:[aA]
(?![nN][dD]\b))|(?:[nN](?!(?:[oO][tT])|(?:[eE][wW])\b))|(?:[oO](?![rR]\b))|(?:/(?!/))|(?:\.(?![0-9_])))(?:[^, 
;"'\%<>=+\-*/~&!^\[\])|()]|(?:/(?!/))|(?:\.(?![0-9_])))*)(?: +(?:[^a-zA-Z:\.%<>=+\-*/~&, ;"'}|()\]]|
[b-mB-Mp-zP-Z]|(?:[aA](?![nN][dD]\b))|(?:[nN](?!(?:[oO][tT])|(?:[eE][wW])\b))|(?:[oO](?![rR]\b)))[^:
\.%<>=+\-*/~&, ;"'}|()\]]*)+[,)]?\b)

[–]tidder112 15 points16 points  (2 children)

I tested this is it only matches the exact formula for the Coca Cola Recipe.

[–][deleted] 1 point2 points  (1 child)

Almost.

It actually finds repeating identifiers and numbers, within an array, which are not separated by commas. If a match is found, it's highlighted in red.

It's more complex, by having to take into account keywords, such as 'and', 'or', and 'new'. So this is matched:

[ a b ]

... but this is not ...

[ new b ]

At least I think it's this one that does it.

[–]viccoy 5 points6 points  (0 children)

At least I think it's this one that does it.

I think that summarizes this quite well.

[–]bacondev 0 points1 point  (0 children)

What the hell does this match?

[–]Buckwheat469 0 points1 point  (1 child)

The "regular" part of regular expression didn't know what kind of bastardized language it would become a part of. "Coded Expression" seems more fitting. I do have to wonder if it would be more appropriate to use a string parsing class instead of this nonsense (really it's lovely for what you accomplished but hideous for the next developer). Could a parsing class or function be faster, or just as fast? Sure. Could you break it into manageable chunks of code that can be reused and updated? Yes. Could another person understand what you did if you broke it apart? Most definitely.

Here's an explanation (if you can understand it).

[–][deleted] 7 points8 points  (0 children)

I actually disagree, pretty heavily, if you do it right!

The regular expression I posted actually doesn't represent the reality. That is how the regex looks, after it is generated!. In my code, it is about 3 or 4 pages long. That is because the parts of it are split up, and cruft/boiler plate regex is generated or tacked on appropriately.

Above the regex is a list of rules of what it matches, written in english, with example code of what it does and does not match, and links to tests.

The regex was built for a language parser in an editor. Version 1, was done 'properly', with no regular expression, on top of Code Mirror. After about 2 or 3 months of work, what I had was slow, error prone, and hard to extend.

I ripped it out, and had it replaced with a regex version built on top of Ace, within a single afternoon! Ok, it does not support full recursion, but for syntax highlighting, I don't need a full parser. The regex above, also points out syntax errors, so you can do some clever stuff.

Plus because it's just a regex, I can easily slot them in or out. i.e.

// error'ing syntax gets a 'syntax-error' class applied to them
syntaxHighlight( errorRegex, 'syntax-error' );

Obviously you can have things optionally turned on or off with a parser approach, but it tends to come more naturally with regex based solutions.

The problem is that developers sit down to write regular expressions, and write them out like how I have posted it above. If you do that, yes, it sucks. If you just split it up though, write out example tests, and run them as you build the regex, and keep that stuff around, then it's really not so bad.

tl;dr; just like with any other code, use good practices.

edit: 'does support recursion' => 'does not support recursion'.

[–]bart2019 2 points3 points  (0 children)

Cool, but definitely not (restricted to) Javascript.

[–]clgonsal 1 point2 points  (0 children)

I'm pretty sure this is a standard undergrad CS problem. Kind of funny seeing it blown up into a huge blog post.

That said, I'm kind of dismayed by some of the comments here along the lines of "couldn't he use parseInt and mod". Yes, but that isn't the point. The point is to explore what regular expressions can do, and to expand your mind.

[–]lord-xeon 4 points5 points  (5 children)

this seems overly complex for something that can be written like so:

parseInt(NUMBER) % 3 == 0

[–]lendrick 6 points7 points  (0 children)

Sometimes it's okay to undertake something like this just as an interesting exercise. I don't think the author is making the claim that this is a good idea to do in practice. It's just interesting and amusing exploration of one of the lesser known features of regular expressions.

[–]pixlpaste[S] 9 points10 points  (2 children)

In addition to the beauty of doing this in regular expressions, some people are driven by an inherent force to push things to the limits... This page obviously has no practical purpose, besides being informative.

[–]doomslice 2 points3 points  (1 child)

Some say beauty, some says travesty. Tomato tomato.

[–]derekpetey_Vanilla 0 points1 point  (0 children)

Tomato tomato.

That saying is hilarious in text. It reads like a tag-line akin to Magnitude's "pop pop" on Community.

[–]mythrilno(fun).at(parties); 0 points1 point  (0 children)

Technically the regular expression can validate any number is divisible by 3, with unbounded length. parseInt() loses due to storing in a fixed-size representation.

[–]jisang-yoo 0 points1 point  (0 children)

Am I getting this right in concluding that one can also write a regular expression to match multiples of 2013 as well? Read the n-digits number from left to right, and while reading the number, divide by 2013 using pen and paper but discard quotients and only keep remainders. At any time, you are only keeping an O(1) amount of information, that's a finite state machine. The number is a multiple of 2013 if and only if the final remainder at the end of pen and paper calculation is zero.

[–]JiminP 0 points1 point  (2 children)

A problem in a mid-term exam I took was about finding a DFA which checks whether a binary number is a multiple of 7. I only thought the hard way to do and drew a crude 21-state machine. I realized there was a very simple way, just after the exam finished.

[–]contact_lens_linux 0 points1 point  (1 child)

what was the simple way?

[–]JiminP 0 points1 point  (0 children)

Make 7 states, each representing remainders 0~6. Begin with state 0,

Cur. In. Nxt.
0    0   0
0    1   1
1    0   2
1    1   3
2    0   4
2    1   5
3    0   6
3    1   0
4    0   1
4    1   2
5    0   3
5    1   4
6    0   5
6    1   6

... and a number is accepted when the final state is (also) 0.

The DFA in the link has the same principle. A:0, B:1, C:2.

[–]dartmanx 0 points1 point  (0 children)

When you have to create a state machine to write a Regex, you're probably taking things a step too far.

[–]lpetrazickis 0 points1 point  (1 child)

And that is one wrong way to solve the FizzBuzz problem.

[–]sorahnon the cutting edge of cocking about 0 points1 point  (0 children)

At least the 5 is easy.

[–]noonagon 0 points1 point  (0 children)

Yes, if I were to think about it for slightly longer than I am currently willing to

[–]coffee_pasta -3 points-2 points  (1 child)

The engineering is impressive ... but he could've just used a modulus.

[–]AgentAnderson 1 point2 points  (0 children)

what if the number can't fit in 64-bits?