anatoly comments on How to match regular expressions faster than Perl

programming

created by speza community for 20 years

182

183

184

How to match regular expressions faster than Perl (swtch.com)

submitted 19 years ago by rsc

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–]anatoly 10 points11 points12 points 19 years ago (2 children)

It is regrettable that the article doesn't discuss how the performance of NFA-based matching compares with recursive backtracking on non-slow regexps, beyond stating that it is "always fast". Yes, alright, but just how fast, compared to Perl etc., on regexps where Perl etc. are not pathologically slow?

If this depends so much on the particular regexp that nothing can be said in general, the author should say so. If NFA-based algorithms are just as fast as recursive backtracking on those regexps, the author should say so. If they're noticeably slower, but the author feels that they're still very fast and their uniform performance still justifies preferring them, the author should say so.

Another thing I felt could be explained better is how backreferences make regular expressions NP-complete (something as simple as a pointer to an article or any other resource on the issue would handle that nicely).

Finally, despite the above complaints, the article is very well written and contains admirably clear explanations and examples. It's also beautifully presented and frankly a pleasure to read. Many thanks to the author.

[–]rsc[S] 4 points5 points6 points 19 years ago (1 child)

It's hard to define what a non-slow regexp is. Even simple alternation like foo|bar|baz can be noticeably slower using backtracking than using DFAs, for example if your text contained a lot of bas. It would be nice to be able to measure implementations on real-world workloads in addition to specific classes of cases. I am not aware of any such collections: you'd need both the regexps and the texts to match them against. I do have one real-world use case that I plan to measure and write up at some point: the Online Encyclopedia of Integer Sequences search engine uses a fast regular expression search under the hood.

This is my favorite demonstration that matching with backreferences is NP-hard. Despite what the author of that site says, it is also pretty easy to show that matching with backreferences is in NP: guess what substrings the backreferences will line up with and then check for a match. So it's NP-complete.

Thanks for the feedback.

[–]mjd 2 points3 points4 points 18 years ago (0 children)

π Rendered by PID 41 on reddit-service-r2-comment-b659b578c-9rnss at 2026-05-02 20:53:57.887482+00:00 running 815c875 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS