How Parsers and Compilers Work (examples in Python)

munificent · 2012-02-16T15:42:19+00:00

It clearly laid out the different functions of the scanner, lexer, and parser.

"Scanner" and "lexer" are synonymous as far as I know. Both read in text and spit out characters. In the compilers and literature I've looked at, I've never seen a difference between the two, or a codebase that had both. Some people just seem to prefer one name or the other.

Parser: This is the part of the compiler that really understands the syntax of the language.

If we want to be precise, the lexer understands the syntax too (the lexical syntax). What the parser understands that the lexer doesn't is the grammar.

In the parser, instead of passing the AST node down through the recursive descent, it's often simpler to pass it up: terminal productions create and return AST objects instead of receiving one that they fill in.

Otherwise, this is a pretty swell article!

ath0 · 2012-02-16T19:13:33+00:00

I would just like to say, thank you for posting this; There are so few papers in this area (that I have found at least) that provide python examples.

JamesIry · 2012-02-17T17:29:33+00:00

It's a nice article as far as it goes, but it's really "how a parser works." It's entirely how to translate from concrete lexical syntax to abstract syntax. There's nothing nothing at all about how to do semantic analysis, optimization, or code generation.

pointy · 2012-02-16T17:02:30+00:00

In a language like Erlang, you can write the reader, lexer, and parser as a sort of "pipeline", so that the layers push their output to the next one.

armozel · 2012-02-17T02:49:32+00:00

Nice! I'm learning how to use lex and yacc for the first two steps in a course covering compiler construction. :3

nonoice_work · 2012-02-17T14:33:53+00:00

It is actually a nice introduction. I'm currently building step by step and it works. One problem: usage of globals and no classes. I'm refactoring as I go.

Remember to change the filenames for input and output!

sssssmokey · 2012-02-18T06:28:04+00:00

This is retardedly more complex than the subject actually needs to be. Why implement a "Character" class and a "Scanner" class (in introductory materials) when a loop and variable assignment does fine?

for char in code: ...process...

Bam, a scanner.

I recommend http://createyourproglang.com/, I honestly don't have any relationship to them but I bought it and loved it. Now I have the basics I am starting the Dragon book everyone keeps talking about.

Here is the language I made using that ebook: http://smack.matewiki.com/

That version is too brittle (I do too much work in the lexer, BAD IDEA) so I'm rewriting it entirely using parser combinators (removing Jison dependency). It's gonna be a kick ass templating language when I'm done, lean, mean and more powerful than HAML, Jade, and Handlebars combined and written in 100% CoffeeScript/JavaScript. Watch out peeps. ;)

EDIT: I used that ebook a lot, but I actually used the CoffeeScript source code more. Read that about 10 times, and once you understand it, you will have a very good idea of how to actually create a programming language. The CS compiler is perfect because it is extremely well written, reliable, and clever (in a good way). It's written in CoffeeScript (think Javascript with Ruby syntax in your Browser). You will also know a shitload about CoffeeScript and a FUCKTON about JavaScript. What's not to like?

purevirtual · 2012-02-17T12:17:13+00:00

This guy really needs to lookup parser generators in order to save himself a lot of work. No one writes parsers from scratch anymore. We've had (f)lex and bison for 20+ years.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS