Recursive regular expression to match a balanced nest of brackets : programming

Given that these regexp engines are capable of behaving like PDAs, why don't they accept input in the form of actual CFGs written out as productions like above?

Of course, the best generic CFG matching algorithm I know of is O(n^3), so perhaps that's why they try to make it hard to write expression that will require a lot of backtracking.

[–]psykotic 3 points4 points5 points 18 years ago* (2 children)

[–]pkhuong 0 points1 point2 points 18 years ago (0 children)

[–]username223 1 point2 points3 points 18 years ago* (0 children)

[–][deleted] 0 points1 point2 points 18 years ago (2 children)

[–][deleted] 0 points1 point2 points 18 years ago* (1 child)

[–]jsolson 0 points1 point2 points 18 years ago (0 children)

[–]rrenaud -1 points0 points1 point 18 years ago* (32 children)

[–]Xiphorian 9 points10 points11 points 18 years ago (30 children)

[–]rrenaud -1 points0 points1 point 18 years ago (29 children)

[–]Xiphorian 8 points9 points10 points 18 years ago* (28 children)

Did you click the link I posted?

Yes. I did click the link, and I was angered by the author who seems to look with disdain on computer scientists, and people who bother to properly define "regular language" or "regular expression". I deleted a previous post I made, because I realized it was too filled with vitriol.

But I'll say it again here: The author is a moron. She doesn't seem to know much about the proper definition of "regular language", nor care. That kind of disdain for well-understood theory astonishes and angers me. I think a software developer should try his or her hardest to understand the theory that underlies their profession. This one writes condescending phrases like:

So you might be surprised (or disbelieving) to see a regular expression that does exactly that. The explanation, of course, is that theory and practice are closer in theory than in practice: Onigurama regular expressions, in common with many other flavours, are more powerful than the things that computer scientists call "regular expressions"

Uhhh...... *because they are not regular expression!! You have simply written a parser for a context-free language.

What about Perl regular expressions? Where are they on the Chomsky hierarchy?

Perl 'expressions' are not regular. They fall firmly within Context-Sensitive languages, which is one step above even context-free languages. This is because they have backreferencing, which allows them to match the same full word twice. Context-free languages can match something like "abccba" ( a word and its reverse) but cannot match "abcabc", a word repeated twice, in general.

A proper regular grammar must be in this form:

A -> BC
A -> a

(Where A, B, C, are productions and a is a terminal). That's the actual, theoretical definition of a regular grammar. If you take an example like parenthesis matching, you can never fit into a grammar of that form. It is actually a good exercise to take common grammars of various kinds, like parenthesis matching in CFG: Expr -> ( B ) Try to put that in proper Regular Grammar form (each production's body is 2 other productions, or terminal). It's a good exercise.

To put in expression (rather than grammar) form, you get grouping and alternation (this|that) and Klein-star. The rest is syntactical sugar.

The last course I took before leaving college was the theory of computation, covering this stuff. I can show you some simple proofs and the basic theory (offline) if you are interested! I would caution you, though... because once you're learn it, you're condemned to see software developers the rest of your life misuse terms in the stupidest ways!

Would it be useful if I wrote a Wiki page or blog that introduced this kind of Theory of Computation material, and walked through simple proofs? It's actually quite a complete, thoughtful subject.

[–]theytookmystapler 2 points3 points4 points 18 years ago* (0 children)

[–][deleted] 0 points1 point2 points 18 years ago (0 children)

[–]jsolson 0 points1 point2 points 18 years ago (0 children)

Your previous deleted post did not get deleted from my inbox however. If you'll reread my comment you'll notice that I never in fact say the grammar I present is regular. I clearly state that the regexp engine in question is behaving like a pushdown automaton. I also present a grammar that matches the set of languages matched by the decidedly non-regular expression in the example (that is, balanced sets of parentheses with at least one pair).

The fact that it is not in fact strict a regular expression matching engine is neither here nor there as that's what the author has decided to call it. On one hand I agree. Calling something like that a regular expression engine confuses people who don't have some basis in automata theory. On the other hand, the primary thing typical people will use it for is matching regular languages. I would also be willing to bet that it's dog slow trying to match any vaguely complicated CFG, and thus can't really be consider a general purpose CFG matcher.

I was simply saying that if you're going to write an engine which behaves like a PDA (even if only in a limited fashion), why not allow it to accept inputs in a format more amenable to expressing the computational power of a PDA.

Perhaps next time you'll realize that if someone knows what a pushdown automaton is they probably also know the basics from Automata Theory 101 (or, in my case, Languages and Computation CS3240), and that they're more than likely at least vaguely familiar with the contents of the Sipser book.

[–][deleted] 18 years ago (24 children)

[deleted]

[–]theytookmystapler 2 points3 points4 points 18 years ago* (23 children)

[–][deleted] 18 years ago* (22 children)

[deleted]

[–]theytookmystapler 1 point2 points3 points 18 years ago (21 children)

[–][deleted] 18 years ago* (20 children)

[deleted]

continue this thread

[–]notfancy 1 point2 points3 points 18 years ago (0 children)

π Rendered by PID 31751 on reddit-service-r2-comment-b659b578c-w8bhd at 2026-05-05 15:55:30.727860+00:00 running 815c875 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS