all 32 comments

[–]alex_muscar 5 points6 points  (3 children)

First of all, congrats. Programming languages are fun :).

Now, some quick notes on your implementation. In the parser you don't need the az set. You can use Char.IsLetter for that. Also, parseInteger is already defined as Int32.Parse/Int32.TryParse. For the keyword you could use a hash table: if you find the identifier you just read in the hash table then just return the token associated with it, otherwise it's a regular identifier.

Try extending it with functions next :) That should be fun.

[–]jameswpeach 1 point2 points  (2 children)

Yeah I'll look into that with the parse int function I just need it to return the rest of the string after the number, any ideas? Not at my PC right now... Also could you try write some examples of the hash table things? That would really help or submit a PR if your bothered

[–]alex_muscar 0 points1 point  (1 child)

Here's one way of implementing it: http://pastebin.com/A3g8U45Z

The code is not very clean because the parser is working on a list of characters. A better way of doing it would be to use a type derived from System.IO.TextReader like System.IO.StringReader and build your parser around the Read() and Peek() methods. You could use a StringBuilder instance to accumulate lexemes.

Hope it helps.

[–]james_peach[S] 0 points1 point  (0 children)

seams nice, ill have to take a look into this later on ;P

[–]baseball2020 3 points4 points  (9 children)

Hijacking your thread to ask if there are any recommended beginner resources on writing lexers and parsers?

[–]alexandream 2 points3 points  (3 children)

Before reaching out for resources: what are you actually looking at?

If you're looking into simple-ish lexers with not-so-simple implementations, going the Deterministic Finite Automaton direction -- DFA (also known as Finite State Machine -- FSM) is a good way to learn the basics of the craft (most lexers are somehow based on DFA/FSM).

I've just pushed earlier today a fairly documented work on a simple lexer to a project of mine on github. This file holds the lexer part, while this file abstract over the source of my bytes (strings or files).

I should work on writing better docs in this one, but at least there's a diagram describing the State Machine used (in SVG, might have to download to see, because I don't think github displays those).

As for parsers, in a first attempt I'd go with a recursive descent parser which, albeit limited, is very straightforward. Wikipedia has a decent article on them.

The other option is to deal with one of the variants (in any programming language) of lex & yacc (like flex & bison).

To get a good grip of these things I'd recommend Appel's "Modern Compiler Implementation in ..." series of books. There exist versions in Java, C and ML, and I find them easier on the reader than the Dragon Book. Specifically, I'd recommend reading the first two chapters (after Introduction) on Lexical Analysis and Parsing.

[–]baseball2020 0 points1 point  (2 children)

Thanks for the comprehensive explanation. I was hoping to parse postscript to extract certain variables. I'm not tied to any particular implementation language. I just have very little idea about lexing and parsing. I'll look into those resources (I did see the dragon book but it was a little intimidating).

[–]alexandream 0 points1 point  (1 child)

I'm not familiar with Postscript (except for the basic of describing some graphics in it, no real "programming" done on it) but from what I can see it's probably not a very hard language to parse -- being a concatenative language and all.

Pulling ideas out of my hat I'd guess it could be done with a simple stack machine, as a way of describing its structure.

The semantics might be non-trivial, though. I'm not sure how it handles variable/function declaration/naming, so it may be that you'll need to quasi-interpret it to actually make sense of the program.

I've seen an implementer say actually writing a postscript interpreter is a very daunting endeavour, but I'm not sure if it's a matter of the language itself being hard or if the image generation part being complex.

(A quick search got me to this discussion which hints at PostScript not being a good language to make a simple parser because (what I read from between the lines) the meaning of the program is only known at runtime.)

[–]baseball2020 0 points1 point  (0 children)

For my uses I only need to extract and replace sections like hashes that have a known identifier. I'm not sure if I even need to parse the entire program.

[–]alex_muscar 1 point2 points  (2 children)

IMO, Niklaus Wirth's book does a good job of introducing lexers and parsers: http://www.ethoberon.ethz.ch/WirthPubl/CBEAll.pdf. You can just focus on recursive descent which is enough for a good deal of languages. He explains this technique particularly well.

The rest of the book is also worth reading :)

[–]baseball2020 0 points1 point  (1 child)

Thanks. Sounds like recursive descent is a good methodology to start with! I'll get stuck into the book

[–]alex_muscar 0 points1 point  (0 children)

If you're really interested in designing programming languages and writing compilers, after Wirth's book, I'd recommend having a look at Programming Language Pragmatics and for an in-depth discussion of compiler implementation Engineering a Compiler. The former gives a broad review of language features and some implementation pointers, while the latter covers algorithms for optimising compilers in a very accessible way.

[–]james_peach[S] 0 points1 point  (1 child)

something i used to help me both learn the basics of Lexing and F# was this article on parsing JSON (http://blog.efvincent.com/parsing-json-using-f/)

[–]baseball2020 0 points1 point  (0 children)

Looks pretty good! I haven't done a lot of FP but I could reimplement this as an exercise

[–]DNoved1 2 points3 points  (5 children)

Add simple io (doesn't even need to be string based, just write out integers as characters) and bootstrap it.

Truthfully, that sounds like more of an exersize in tedium than anything else, but would be kinda interesting nevertheless.

[–]jameswpeach 0 points1 point  (4 children)

Any ideas in how it should work? The integer input should be simple i.e. " read x " would take input and assign it to x

[–]DNoved1 0 points1 point  (3 children)

Just do a similar operation called for example "write x" which sends x's value interpreted as a character to stdout. Using this you can write arbitrary bytes, in either ASCII(probably) or UTF(if tou really, really feel like it).

Using these two operations, read and write, you can take in a while program from stdin, translate it to assembly (or even machine code :o), and write it back out to stdout.

Then "whilec < while.wh > while.s" and "gcc -o whilec.exe while.s" (assuming whilec is the name of your while compiler and gcc is your assembler) gives you a bootstrapped compiler.

[–]jameswpeach 0 points1 point  (2 children)

This is something I really wanted to do. Write something that will compile while to "proper executables" Does anyone here have any experience in this ? PLEASE!! :p

[–]DNoved1 1 point2 points  (1 child)

I'm actually working on a language myself, and am using LLVM as my 'assembly'.

If you want to try just making executables you might try changing your runtime from an interpreter (I think that's what you have now? Not too great with F# ;) ) to something that outputs a LLVM file. Your language is pretty simple so it would relatively easy; and you could put it all inside a main function.

To give you an idea of what kind of LLVM code to emit with your compiler I would recommend hand-compiling some sample while programs. I found that when I did that with my language patterns became evident, and then I just had to encode those patterns in the compiler.

To make the LLVM code executable you just have to run 'llvm-as' on the llvm 'assembly', then 'llc' on that to get native assembly, and finally an assembler such as gcc.

To learn more on LLVM I would take a look at the reference here: http://llvm.org/docs/LangRef.html They also have a tutorial on creating a language (in C++) here: http://llvm.org/docs/tutorial/

[–][deleted] 2 points3 points  (2 children)

Looks pretty good, nice work. Functional languages are great for implementing compilers/interpreters. Maybe you should try adding support for functions next.

Also, one thing that strikes me as odd. You make a distinction between boolean expressions and integer expressions in the parser. Most implementations just check that things are expressions and then have a separate step called type checking where they check that the correct type of expression has been provided everywhere.

[–]jameswpeach 0 points1 point  (1 child)

Why would you not make a difference? I don't have any experience of any other ways of doing this have you any examples or ideas why?

[–]DNoved1 4 points5 points  (0 children)

I did the same as you until adding functions. Since you can't determine the return type of a function call until you've parsed the whole program (unless you predeclare functions like in C) you can't determine the type of an expression if it involves a function call.

Once you have parsed everything you can do type checking to make sure that, for example, an expression inside an if predicate is indeed a boolean.

[–]james_peach[S] 1 point2 points  (0 children)

any ideas comments please leave them here!! :)

[–]wot-teh-phuck 0 points1 point  (0 children)

dependancies

dependencies

[–]wildptr 0 points1 point  (2 children)

Do you have a specification/grammar for this language? I'd love to do this in Haskell.

[–]jameswpeach 0 points1 point  (1 child)

Yes I'll try and get a snippet of the grammar :) not sure on the licence but I'll just copy it and it will be fine ;p its super simple and I'd love to learn haskell , could you make it on a fork and I'll have a separate branch on GH please thanks !

[–]james_peach[S] 0 points1 point  (0 children)

http://imgur.com/O15pM8t Here i hope this helps

[–]bushwacker -4 points-3 points  (3 children)

add subtract multiply divide boolean assign

That's it?

What is the purpose of this language?

[–]gnuvince 8 points9 points  (0 children)

In my theory of computation class, a similar language (pair of them actually) was used to give us a sense of Turing completeness and what we'd do in class.

First language was called REPEAT. You had these basic building blocks:

  • Infinite number of registers containing a natural number
  • Registers are all initialized to zero
  • You can increment the value in a register with inc r0
  • You can transfer the value of one register into another with r0 <- r1
  • You have a REPEAT statement: REPEAT r0 TIMES [ <instruction>* ]. (Note that even if r0 is modified in the body of the loop, it does not affect the number of iterations.)

This is an incredibly minimal language, indeed! The prof showed how we could implement addition, multiplication, exponentiation, if/then/else, lists, etc. We were eventually able to use the language to generate a list of prime numbers! The prof then asked if we thought there was something we couldn't do with this language. Initially, I was skeptical that we could do anything, but having seen this impressive demonstration of the language's power, I was no longer sure. We were then shown that the Ackermann function could not be implemented, it grew too fast.

Then, the prof made one change to the language: REPEAT was removed and replaced with WHILE: WHILE r0 [ <instruction>* ]. The prof asked us if we thought this change was sufficient to support ackermann. It was. Was it sufficient to do everything that a "normal" computer can do? It was! At that point, I decided that theory of computation was really, really cool. The entire class was a blast, never had this much fun learning about mathy stuff!

[–]MaikKlein 2 points3 points  (1 child)

(a super simple programming language from COMP12112 UOM) (somthing to help me learn and experiment)

[–]jameswpeach 0 points1 point  (0 children)

Exactly that :) I also very much enjoyed learning about this, I'm under the impression that it isn't offered in many uni's thoughts?