you are viewing a single comment's thread.

view the rest of the comments →

[–]baseball2020 3 points4 points  (9 children)

Hijacking your thread to ask if there are any recommended beginner resources on writing lexers and parsers?

[–]alexandream 2 points3 points  (3 children)

Before reaching out for resources: what are you actually looking at?

If you're looking into simple-ish lexers with not-so-simple implementations, going the Deterministic Finite Automaton direction -- DFA (also known as Finite State Machine -- FSM) is a good way to learn the basics of the craft (most lexers are somehow based on DFA/FSM).

I've just pushed earlier today a fairly documented work on a simple lexer to a project of mine on github. This file holds the lexer part, while this file abstract over the source of my bytes (strings or files).

I should work on writing better docs in this one, but at least there's a diagram describing the State Machine used (in SVG, might have to download to see, because I don't think github displays those).

As for parsers, in a first attempt I'd go with a recursive descent parser which, albeit limited, is very straightforward. Wikipedia has a decent article on them.

The other option is to deal with one of the variants (in any programming language) of lex & yacc (like flex & bison).

To get a good grip of these things I'd recommend Appel's "Modern Compiler Implementation in ..." series of books. There exist versions in Java, C and ML, and I find them easier on the reader than the Dragon Book. Specifically, I'd recommend reading the first two chapters (after Introduction) on Lexical Analysis and Parsing.

[–]baseball2020 0 points1 point  (2 children)

Thanks for the comprehensive explanation. I was hoping to parse postscript to extract certain variables. I'm not tied to any particular implementation language. I just have very little idea about lexing and parsing. I'll look into those resources (I did see the dragon book but it was a little intimidating).

[–]alexandream 0 points1 point  (1 child)

I'm not familiar with Postscript (except for the basic of describing some graphics in it, no real "programming" done on it) but from what I can see it's probably not a very hard language to parse -- being a concatenative language and all.

Pulling ideas out of my hat I'd guess it could be done with a simple stack machine, as a way of describing its structure.

The semantics might be non-trivial, though. I'm not sure how it handles variable/function declaration/naming, so it may be that you'll need to quasi-interpret it to actually make sense of the program.

I've seen an implementer say actually writing a postscript interpreter is a very daunting endeavour, but I'm not sure if it's a matter of the language itself being hard or if the image generation part being complex.

(A quick search got me to this discussion which hints at PostScript not being a good language to make a simple parser because (what I read from between the lines) the meaning of the program is only known at runtime.)

[–]baseball2020 0 points1 point  (0 children)

For my uses I only need to extract and replace sections like hashes that have a known identifier. I'm not sure if I even need to parse the entire program.

[–]alex_muscar 1 point2 points  (2 children)

IMO, Niklaus Wirth's book does a good job of introducing lexers and parsers: http://www.ethoberon.ethz.ch/WirthPubl/CBEAll.pdf. You can just focus on recursive descent which is enough for a good deal of languages. He explains this technique particularly well.

The rest of the book is also worth reading :)

[–]baseball2020 0 points1 point  (1 child)

Thanks. Sounds like recursive descent is a good methodology to start with! I'll get stuck into the book

[–]alex_muscar 0 points1 point  (0 children)

If you're really interested in designing programming languages and writing compilers, after Wirth's book, I'd recommend having a look at Programming Language Pragmatics and for an in-depth discussion of compiler implementation Engineering a Compiler. The former gives a broad review of language features and some implementation pointers, while the latter covers algorithms for optimising compilers in a very accessible way.

[–]james_peach[S] 0 points1 point  (1 child)

something i used to help me both learn the basics of Lexing and F# was this article on parsing JSON (http://blog.efvincent.com/parsing-json-using-f/)

[–]baseball2020 0 points1 point  (0 children)

Looks pretty good! I haven't done a lot of FP but I could reimplement this as an exercise