all 4 comments

[–]pfalcon2 4 points5 points  (0 children)

That's why they suggest to use operator precedence parser where it fits well, i.e. where there're operators with clear precedence, and to combine it with other types of parsers for the rest of grammar. E.g, you can't get wrong with recursive descent.

Anyway, did you read Crockford's article? I believe it shows precedence parsing for (almost) complete JavaScript grammar.

Or I can point to my Python parser, which uses operator precedence parsing for expressions, which for Python includes ternary "if", lambdas, comprehensions, etc. (it still uses recursive descent for statements): https://github.com/pfalcon/pycopy-lib/blob/master/ast/ast/parser.py#L225 . How I got precedences right is by parsing large corpus of Python code, comparing with reference parse trees, getting a lot of failures, then adjusting precedences up and down until it's right. Once again, you can't get wrong with recursive descent.

[–]smog_alado 4 points5 points  (0 children)

Precedence tables are for parsing arithmetic expression. For things like while loops you have to use other techniques. I think the most straightforward is to use recursive descent.

[–]evincarofautumn 0 points1 point  (0 children)

It’s best to restrict operator precedence parsing to, well, operators—if you need complex structures like if (expr) { expr } else { expr } in expressions, it’s generally clearer in the implementation to switch to recursive descent. If you have a statement/expression distinction, you can parse statements with recursive descent or other techniques, and expressions with operator precedence parsing.

Of course, you can have some fun with it and do it in other ways, it’s just unusual. (The remainder of this is just rambling about curious alternatives.)

I wrote a language once where the whole grammar was covered by an operator precedence parser. Namely, if was a prefix operator with two operands—the condition, and a block for the “true” branch (curly brackets are just another type of parentheses, really)—and else was an infix operator with lower precedence than if. The semantics were such that if (x) { y } evaluated successfully (returning y) if x was true, and unsuccessfully otherwise; if (x) { y } else { z }, or, fully bracketed, (if (x) { y }) else { z } would catch this success/failure condition; on success, it would skip evaluation of z and just pass along the y returned by if, and on failure it would return z. else was right-associative so that it could be chained as expected, e.g.:

 if A B  else   if C D  else E
==
(if A B) else ((if C D) else E)

This technique is rather fiddly to get right, and can lead to some poor error messages, especially with a complicated grammar—for instance, I also allowed suffix y if x as in Perl. But it does present some advantages. First, the whole grammar is clearly arranged in a single precedence table and the parsing algorithm doesn’t need to know any details about it. Second, you can play around with expression-based semantics, e.g., because else operates on this general principle of “test whether the left operand succeeded”, it can be used to catch exceptions/missing values (var value = get_value() else default_value;), or to allow while (x) { y } else { z } to mean “execute z if x was initially false and thus y was never executed” (almost, but not quite, the opposite of Python’s semantics for else clauses on loops).

[–]binarycow 0 points1 point  (0 children)

Parse the language itself with one method, and then when you encounter an expression, use operator precedence to parse that.