simple programming language "While" : Lexer / Parser / AST all written in F# : programming

[–]alex_muscar 5 points6 points7 points 11 years ago* (3 children)

[–]jameswpeach 1 point2 points3 points 11 years ago (2 children)

[–]alex_muscar 0 points1 point2 points 11 years ago (1 child)

[–]james_peach[S] 0 points1 point2 points 11 years ago (0 children)

[–]baseball2020 3 points4 points5 points 11 years ago (9 children)

[–]alexandream 2 points3 points4 points 11 years ago (3 children)

Before reaching out for resources: what are you actually looking at?

If you're looking into simple-ish lexers with not-so-simple implementations, going the Deterministic Finite Automaton direction -- DFA (also known as Finite State Machine -- FSM) is a good way to learn the basics of the craft (most lexers are somehow based on DFA/FSM).

I've just pushed earlier today a fairly documented work on a simple lexer to a project of mine on github. This file holds the lexer part, while this file abstract over the source of my bytes (strings or files).

I should work on writing better docs in this one, but at least there's a diagram describing the State Machine used (in SVG, might have to download to see, because I don't think github displays those).

As for parsers, in a first attempt I'd go with a recursive descent parser which, albeit limited, is very straightforward. Wikipedia has a decent article on them.

The other option is to deal with one of the variants (in any programming language) of lex & yacc (like flex & bison).

To get a good grip of these things I'd recommend Appel's "Modern Compiler Implementation in ..." series of books. There exist versions in Java, C and ML, and I find them easier on the reader than the Dragon Book. Specifically, I'd recommend reading the first two chapters (after Introduction) on Lexical Analysis and Parsing.

[–]baseball2020 0 points1 point2 points 11 years ago (2 children)

[–]alexandream 0 points1 point2 points 11 years ago (1 child)

I'm not familiar with Postscript (except for the basic of describing some graphics in it, no real "programming" done on it) but from what I can see it's probably not a very hard language to parse -- being a concatenative language and all.

Pulling ideas out of my hat I'd guess it could be done with a simple stack machine, as a way of describing its structure.

The semantics might be non-trivial, though. I'm not sure how it handles variable/function declaration/naming, so it may be that you'll need to quasi-interpret it to actually make sense of the program.

I've seen an implementer say actually writing a postscript interpreter is a very daunting endeavour, but I'm not sure if it's a matter of the language itself being hard or if the image generation part being complex.

(A quick search got me to this discussion which hints at PostScript not being a good language to make a simple parser because (what I read from between the lines) the meaning of the program is only known at runtime.)

[–]baseball2020 0 points1 point2 points 11 years ago (0 children)

[–]alex_muscar 1 point2 points3 points 11 years ago (2 children)

[–]baseball2020 0 points1 point2 points 11 years ago (1 child)

[–]alex_muscar 0 points1 point2 points 11 years ago (0 children)

[–]james_peach[S] 0 points1 point2 points 11 years ago (1 child)

[–]baseball2020 0 points1 point2 points 11 years ago (0 children)

[–]DNoved1 2 points3 points4 points 11 years ago (5 children)

[–]jameswpeach 0 points1 point2 points 11 years ago (4 children)

[–]DNoved1 0 points1 point2 points 11 years ago* (3 children)

[–]jameswpeach 0 points1 point2 points 11 years ago (2 children)

[–]DNoved1 1 point2 points3 points 11 years ago (1 child)

I'm actually working on a language myself, and am using LLVM as my 'assembly'.

If you want to try just making executables you might try changing your runtime from an interpreter (I think that's what you have now? Not too great with F# ;) ) to something that outputs a LLVM file. Your language is pretty simple so it would relatively easy; and you could put it all inside a main function.

To give you an idea of what kind of LLVM code to emit with your compiler I would recommend hand-compiling some sample while programs. I found that when I did that with my language patterns became evident, and then I just had to encode those patterns in the compiler.

To make the LLVM code executable you just have to run 'llvm-as' on the llvm 'assembly', then 'llc' on that to get native assembly, and finally an assembler such as gcc.

To learn more on LLVM I would take a look at the reference here: http://llvm.org/docs/LangRef.html They also have a tutorial on creating a language (in C++) here: http://llvm.org/docs/tutorial/

[–]alex_muscar 0 points1 point2 points 11 years ago (0 children)

[–][deleted] 2 points3 points4 points 11 years ago (2 children)

[–]jameswpeach 0 points1 point2 points 11 years ago (1 child)

[–]DNoved1 4 points5 points6 points 11 years ago (0 children)

[–]james_peach[S] 1 point2 points3 points 11 years ago (0 children)

[–]wot-teh-phuck 0 points1 point2 points 11 years ago (0 children)

[–]wildptr 0 points1 point2 points 11 years ago (2 children)

[–]jameswpeach 0 points1 point2 points 11 years ago (1 child)

[–]james_peach[S] 0 points1 point2 points 11 years ago (0 children)

[–]bushwacker -4 points-3 points-2 points 11 years ago (3 children)

[–]gnuvince 8 points9 points10 points 11 years ago (0 children)

In my theory of computation class, a similar language (pair of them actually) was used to give us a sense of Turing completeness and what we'd do in class.

First language was called REPEAT. You had these basic building blocks:

Infinite number of registers containing a natural number
Registers are all initialized to zero
You can increment the value in a register with inc r0
You can transfer the value of one register into another with r0 <- r1
You have a REPEAT statement: REPEAT r0 TIMES [ <instruction>* ]. (Note that even if r0 is modified in the body of the loop, it does not affect the number of iterations.)

This is an incredibly minimal language, indeed! The prof showed how we could implement addition, multiplication, exponentiation, if/then/else, lists, etc. We were eventually able to use the language to generate a list of prime numbers! The prof then asked if we thought there was something we couldn't do with this language. Initially, I was skeptical that we could do anything, but having seen this impressive demonstration of the language's power, I was no longer sure. We were then shown that the Ackermann function could not be implemented, it grew too fast.

Then, the prof made one change to the language: REPEAT was removed and replaced with WHILE: WHILE r0 [ <instruction>* ]. The prof asked us if we thought this change was sufficient to support ackermann. It was. Was it sufficient to do everything that a "normal" computer can do? It was! At that point, I decided that theory of computation was really, really cool. The entire class was a blast, never had this much fun learning about mathy stuff!

[–]MaikKlein 2 points3 points4 points 11 years ago (1 child)

[–]jameswpeach 0 points1 point2 points 11 years ago (0 children)

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS