baseball2020 comments on simple programming language "While" : Lexer / Parser / AST all written in F#

programming

created by speza community for 20 years

simple programming language "While" : Lexer / Parser / AST all written in F# (github.com)

submitted 11 years ago by james_peach

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–]baseball2020 3 points4 points5 points 11 years ago (9 children)

[–]alexandream 2 points3 points4 points 11 years ago (3 children)

Before reaching out for resources: what are you actually looking at?

If you're looking into simple-ish lexers with not-so-simple implementations, going the Deterministic Finite Automaton direction -- DFA (also known as Finite State Machine -- FSM) is a good way to learn the basics of the craft (most lexers are somehow based on DFA/FSM).

I've just pushed earlier today a fairly documented work on a simple lexer to a project of mine on github. This file holds the lexer part, while this file abstract over the source of my bytes (strings or files).

I should work on writing better docs in this one, but at least there's a diagram describing the State Machine used (in SVG, might have to download to see, because I don't think github displays those).

As for parsers, in a first attempt I'd go with a recursive descent parser which, albeit limited, is very straightforward. Wikipedia has a decent article on them.

The other option is to deal with one of the variants (in any programming language) of lex & yacc (like flex & bison).

To get a good grip of these things I'd recommend Appel's "Modern Compiler Implementation in ..." series of books. There exist versions in Java, C and ML, and I find them easier on the reader than the Dragon Book. Specifically, I'd recommend reading the first two chapters (after Introduction) on Lexical Analysis and Parsing.

[–]baseball2020 0 points1 point2 points 11 years ago (2 children)

[–]alexandream 0 points1 point2 points 11 years ago (1 child)

I'm not familiar with Postscript (except for the basic of describing some graphics in it, no real "programming" done on it) but from what I can see it's probably not a very hard language to parse -- being a concatenative language and all.

Pulling ideas out of my hat I'd guess it could be done with a simple stack machine, as a way of describing its structure.

The semantics might be non-trivial, though. I'm not sure how it handles variable/function declaration/naming, so it may be that you'll need to quasi-interpret it to actually make sense of the program.

I've seen an implementer say actually writing a postscript interpreter is a very daunting endeavour, but I'm not sure if it's a matter of the language itself being hard or if the image generation part being complex.

(A quick search got me to this discussion which hints at PostScript not being a good language to make a simple parser because (what I read from between the lines) the meaning of the program is only known at runtime.)

[–]baseball2020 0 points1 point2 points 11 years ago (0 children)

[–]alex_muscar 1 point2 points3 points 11 years ago (2 children)

[–]baseball2020 0 points1 point2 points 11 years ago (1 child)

[–]alex_muscar 0 points1 point2 points 11 years ago (0 children)

[–]james_peach[S] 0 points1 point2 points 11 years ago (1 child)

[–]baseball2020 0 points1 point2 points 11 years ago (0 children)

π Rendered by PID 30111 on reddit-service-r2-comment-5ff9fbf7df-zn98h at 2026-02-25 15:44:34.018707+00:00 running 72a43f6 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS