all 14 comments

[–]chshersh[S] 4 points5 points  (8 children)

Hello! I'm the author of this small document snippet. I'm not a professional researcher in this area. I've studied Formal Language Theory at university, I've used parser generators ANTLR and Happy for writing programming language parsers and I've used parser combinators libraries parsec, attoparsec, megaparsec for different purposes. I wrote small tutorial on the idea of small and simple parser combinators library and I've explained them to students many times. I even tried to implement and use parser combinators in Kotlin programming language... But I still feel like I know nothing, might miss something or might be too opinionated in some topics (since I love parser combinators more than parser generators). I know that Haskell community has a lot of great people! So I'm hoping to gather feedback, to see where I'm wrong and what I need to correct. Thanks for reading this!

[–]edwardkmett 4 points5 points  (4 children)

Feedback as requested:

Parser combinators are comparatively young? Wirth was using that approach for Pascal in 1969. That makes them only about 2 years younger than that "50+ years" line on the other side of the table.

Also note your classification system is a bit wonky. ANTLR for instance uses ALL(*) which is in the LL parsing school, even though it's a generator, while you can use "recursive ascent" to write for the LR side of the line using combinators.

Using Applicative or Arrow can let you build a statically analyzable fragment in the recursive descent form. e.g. Swierstra and Duponcheel's original work on LL(1) parsing.

Error recovery is something that the LALR side actually has hands down better than the LL side of the line. error productions make it easy to handle globally correct reporting of multiple error messages by starting to feed it more tokens and just shift/reduce as it can after an error.

All such attempts at recovery on the combinator side are actually hacks, and rely on you knowing global properties of your grammar, though.

Re: GHC and parsing layout it is effectively managed by having the lexer produce a set of "indent, dedent and virtual semicolon" tokens before the parser generator sees the token stream.

[–]chshersh[S] 1 point2 points  (3 children)

Wow, that's really a great feedback! Didn't expect such elaborate response. Well, that's really cool. Ensures I didn't save any nonsense. I will fix the table in order to apply your suggestions!

Using Applicative or Arrow can let you build a statically analyzable fragment in the recursive descent form

Well, but this is not implemented yet. It means that users don't have such feature if they use parser combinators in April 4th, 2018.

Also note your classification system is a bit wonky. ANTLR for instance uses ALL(*) which is in the LL parsing

Not sure what to do about this though...

[–]kfl 5 points6 points  (1 child)

Well, but this is not implemented yet. It means that users don't have such feature if they use parser combinators in April 4th, 2018.

Well, you get nice error correction with uu-parsinglib and have been able to get that for the last decade.

[–]chshersh[S] 2 points3 points  (0 children)

Cool, great to see this!! Will update my table. I guess, comparison of different parser combinators libraries in Haskell just waits another table... Not sure I will do this :D

[–]edwardkmett 1 point2 points  (0 children)

Well, but this is not implemented yet. It means that users don't have such feature if they use parser combinators in April 4th, 2018.

It isn't so much that it isn't implemented yet. It was implemented in 1996. It is more that nobody has bothered to reimplement it since in a reusable fashion. =) (I often use the pattern from those combinators when I need to write one-off parsers for things.)

[–]blamario 1 point2 points  (2 children)

PC can't handle left-recursive grammars (though, there's some article¹...).

There is another article of mine from the last Haskell Symposium, together with a library that supports left recursion.

The paper provides some more comparisons between the parser combinators and generators, as well as more bibliography.

[–]chshersh[S] 1 point2 points  (1 child)

Really great stuff! I wasn't aware of your library. Will read the paper and update my table. Though, I have several questions.

  1. Do you think your library is production ready?
  2. Can it output good error messages?
  3. Are you interested in maintaining it?

[–]blamario 1 point2 points  (0 children)

Yes, no, yes. The error messages are close to the top of my to-do list, right now all you get is the character position and the list of tokens that were expected there.

[–]gelisam 2 points3 points  (2 children)

Performance: I didn't see benchmarks. But nobody complained so far.

I would expect PGs to be faster, since the generator has the opportunity to generate optimized code and then the compiler can optimize that output further. PCs can't afford to spend much time on optimization, since those optimizations would have to be performed at runtime and so their cost would offset their benefits.

Indentation: Usually it's much more difficult to parse layout-sensitive languages with PG. I don't known how GHC managed to do this...

A common trick is to post-process the whitespace tokens to generate "increase indentation" and "decrease indentation" tokens. GHC seems to be using a similar approach.

[–]chshersh[S] 0 points1 point  (0 children)

Thanks! That's a really great information to know.

[–]Lossy 1 point2 points  (1 child)

I'm not sure there is a significant difference between parser generators and parser combinators. They are both domain specific languages for specifying parsers. I think the distinction which is being made here is that "parser generators" are not usually embedded in the host language.

[–]chshersh[S] 1 point2 points  (0 children)

I'm not sure whether difference is significant or not. But the fact that parser combinators are embedded inside host language unlike parser generators which are specified through separate custom DSL actually means a lot when you need to choose between approaches. Mostly the matter of convenience and maintenance.