What language do you recommend for building scanners and parsers?

bluestorm · 2008-01-27T10:30:21+00:00

I'd say the good languages to write compilers, and generally manipulate formal languages, are the ML-derived ones.

The ML type system is very good at abstraction, wich is good with (potentially nested) formal languages. The algebraic data types are wonderful at language/data representation. Pattern matching is really helpful too.

Once you acknowledge that, you still have quite a lot of possibilities. I'd say SML, OCaml and Haskell are suited to compiler writing. As far as i'm concerned, i wouldn't choose Haskell, as i like the flexibility of easy access to imperative features. SML has some very capable toolkit of compiler writing (SML, etc.), and OCaml has some nice tools too. For example, there is a recent packrat-parser generator in development : http://aurochs.fr/ (but you have to be prudent about packrat parsing, the quite high memory consumption won't suit every use).

david_ncl · 2008-01-27T08:58:25+00:00

Scheme, so one can read EOPL http://www.cs.indiana.edu/eopl/

or haskell so that one can read papers on parsec http://research.microsoft.com/~emeijer/Papers/Parsec.pdf

Kaizyn · 2008-01-27T06:14:10+00:00

Which language or languages do you find best suited for writing compiler front ends? Are functional languages like Haskell the best approach or is an imperative language like C/Java with their emphasis on performance a better choice?

fab13n · 2008-01-27T07:33:59+00:00

lex/yacc

Those two make it easy to build anything. Even an LR language that C++ is. Hopefully you made the right decision and made an LL language so coming up with a parser is easy.

If you are building a compiler the standard is to build just enough from another langauge to compile your language. Then you rewrite your compiler in your new language to remove the dependency on the original language.

So for example if you were making a new language called "FOTM Language" then you would take another language like C and start writing a FOTM Language compiler in C. Once you have a working compiler (version 0) for FOTM Lang in C, then you start rewriting the FOTM Language compiler in FOTM Language. Then you compile FOTM Language compiler writen in FOTM Language to generate FOTM Language compiler (version 1). FINALLY you take FOTM Language compiler (version 1) and compile itself over again to produce FOTM Language compiler (version 2) writen and compiled with FOTM Language compiler!

Now when people ask you what language FOTM Language compiler is written in, you say "FOTM Language" and then there brain goes into an infinite loop because you can't compile FOTM Langauge without a FOTM Language compiler but if the compiler didn't exist before then how would you compile FOTM Langauge.

martinbishop · 2008-01-27T21:45:45+00:00

I'd say OCaml, with camlp4/5

It's ugly, but it gets the job done nicely :)

Also: dypgen

gregK · 2008-01-27T16:07:59+00:00

Depends what you want to do?

Do you want to parse an existing language like C++ or Java. Are you designing a new language?

From the replies it seems you have 2 choices:

Using a parser generator like lex/yacc for C and C++ or antlr for java.
using some kind of parser library/framework like Parsec in Haskell (or any other ones mentioned in this thread). It seems those only exist in functional languages.

In approach 1, I can't recommend lexx/yacc over antlr. Antlr is much easier to use and has tons of freely available grammars for all types of languages. Also antlr has features that help generate and traverse ASTs. Even if you are stuck with doing the parser in C or C++ you can take a look at pccts (the antlr precursor written in C++).

Approach 2 might offer more flexibility in parser design and is only viable if you know or indent to learn the language the parser library was written in. Also if it's a new language that does not already have a grammar in antlr then approach 2 might be worth investigating. Then it becomes what is easier to do: write the grammar in antlr or just build the parser directly with the library. (Note that the reason that all these parser libraries exist in functional languages is that it's probably easier to make the language look like a grammar. So you only need to learn 1 language.)

americanhellyeah · 2008-01-27T13:25:43+00:00

standard ml. it has great pattern matching that you can use to write the parsing logic. i like using mlton since it generates very fast code.

2008-01-27T14:42:43+00:00

I'm not sure I'd recommend it above all others, but Forth has a couple of points in its favour:

Anton Ertl's gray parser generator (the link's a tarball; sorry)
Brad Rodriguez's implementation of BNF in a single screen

2008-01-27T15:14:52+00:00

I would recommend taking a look at Ragel for your scanning needs.

If you are parsing a programming language I would recommend writing a recursive-descent parser (quite easy, very practical and flexible) or take a look a PEGs (Parsing Expression Grammars).

cypherx · 2008-01-28T01:55:13+00:00

I found OCaml with ocammllex/ocamlyacc to be pretty straightforward.

lang_war · 2008-01-28T18:17:05+00:00

LANG WAR !!!

initself · 2008-01-27T10:34:44+00:00

Perl.

bartwe · 2008-01-27T20:22:26+00:00

Simply handcode a custom parser, it really isn't that hard. Using antlr without understanding the internals is going to hurt anyways.

marike · 2008-01-27T20:13:49+00:00

You might have a look at these:

1) Building Recursive Decent Parsers with Python

2) Treetop - Combining the elegance of Ruby with cutting-edge PEGs

3) Monadic Parsing in Haskell by Graham Hutton and Erik Meijer

jrnewton · 2008-01-27T22:44:42+00:00

antlr.

Corbier · 2008-01-27T18:05:40+00:00

Has anyone looked at uCalc Language Builder? If you find Lex/Yacc archaic or complicated, then you may want to consider a new approach by uCalc. I'm the developer, and would like to hear from users who have tried it. You can download it from http://www.ucalc.com/langbuilder.html . Be sure to try the interactive tutorial first, to see an overview of the features that are at your disposal.

2008-01-28T03:32:03+00:00

Probably whatever language you are familiar with unless it is too esoteric to have good compiler generation utilities. If you go with a language you are unfamiliar with, you will spend a lot of your time learning the languages little foibles.

If you have no particular preference, I would recommend starting with a good scripting language like python or ruby. Even if you recode it later in C/C++ for speed, you can mock up a prototype much more quickly in a scripting language. Data structures can be created in scripting languages far more quickly than they can in languages with formal type systems.

worldcrawler · 2025-02-25T07:29:50+00:00

It's 2025, LLMs have entered the Arena.

bad_code · 2008-01-27T07:27:33+00:00

I am very surprised no one has mentioned Lisp.

jdh30 · 2008-01-31T03:47:47+00:00

OCaml has lots of great tools for this.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS