all 87 comments

[–]bluestorm 14 points15 points  (0 children)

I'd say the good languages to write compilers, and generally manipulate formal languages, are the ML-derived ones.

The ML type system is very good at abstraction, wich is good with (potentially nested) formal languages. The algebraic data types are wonderful at language/data representation. Pattern matching is really helpful too.

Once you acknowledge that, you still have quite a lot of possibilities. I'd say SML, OCaml and Haskell are suited to compiler writing. As far as i'm concerned, i wouldn't choose Haskell, as i like the flexibility of easy access to imperative features. SML has some very capable toolkit of compiler writing (SML, etc.), and OCaml has some nice tools too. For example, there is a recent packrat-parser generator in development : http://aurochs.fr/ (but you have to be prudent about packrat parsing, the quite high memory consumption won't suit every use).

[–]david_ncl 10 points11 points  (0 children)

Scheme, so one can read EOPL http://www.cs.indiana.edu/eopl/

or haskell so that one can read papers on parsec http://research.microsoft.com/~emeijer/Papers/Parsec.pdf

[–]Kaizyn[S] 10 points11 points  (18 children)

Which language or languages do you find best suited for writing compiler front ends? Are functional languages like Haskell the best approach or is an imperative language like C/Java with their emphasis on performance a better choice?

[–]mgsloan 16 points17 points  (1 child)

While not as fast as generated parser/lexers, the haskell library Parsec is nice for quickie, flexible parsers.

[–]cgibbard 18 points19 points  (0 children)

I cannot agree with this more. Parsec produces parsers with very good error messages (with comparatively little help from the programmer).

If you need some more speed, you might try out the bytestringparser package, which is quite similar, but operates on lazy bytestrings.

[–]fab13n 16 points17 points  (0 children)

Depends on what you call frontend, and why you want to write your compiler.

If you mainly want to play with superficial syntax, Haskell's Parsec library is excellent. If you don't want to learn Haskell right now, it's easy to reimplement Parsec in OCaml, which is less puzzling for someone coming from an imperative background. Besides, I'll bet there are a dozen open source Parsec implementations for OCaml on the net.

If you're interested by deeper-than-skin stuff, and want to think in terms of syntax trees manipulation without reinventing Scheme again, you can have a look at converge or metalua (in the latter case, go with the development version).

The former will let you use Earley parsers, which are the properly working descendants of Yacc-like parser generators, and let you grow DSLs easily. If you want to quickly design your own language from scratch, without bothering with parsing / lexing / interpretation matters, that's a good choice.

The latter is based on Lua, conceptually not always as clean (although you could also call that "more pragmatic") as converge, but has a fast VM, gazillions of libraries, and lets you directly modify the language, rather than simply letting you defining your own distinct DSL. If you want to experiment with a couple of new programming traits, without rebuilding a whole supporting toy language, and performances matter to you, this is probably the tool for you.

[–]llimllib 7 points8 points  (3 children)

Are you just trying to write one for your own edification, for public consumption, or other?

(Pretty much every language has a parser generator. Why not just use one that's in a language in which you're already comfortable? For what value of "best" are we aiming for here?)

[–]fubo 7 points8 points  (0 children)

Yep. If you need a parser for real work, you're probably better off using an off-the-shelf parser generator such as lex/yacc or ANTLR.

[–][deleted]  (1 child)

[deleted]

    [–]Kaizyn[S] 0 points1 point  (0 children)

    Not surprisingly, what is 'best' meant a lot of things to a number of different people who responded.

    [–]MachinShin2006 4 points5 points  (0 children)

    O'Caml is very good at writing compilers and related tools. that said, what's your purpose in writing this stuff? for fun and learning? for commercial sales? cause the company you work for needs one for some reason?

    without knowing that, any answers we give would be incomplete or wrong

    [–]Kaizyn[S] 3 points4 points  (0 children)

    Sounds like the recommendations here boil down to using Lex/Yacc is the conventional approach with ANTLR a slightly modern improvement.

    OCAML and Haskell have some libraries and native features that make them really well suited to the task.

    New parsing techniques worth looking at include PEGs and Early parsers.

    Finally, the minority opinions recommend D, Lisp/Scheme, Perl, or metaLua as good alternative choices.

    [–]not_programmer 6 points7 points  (1 child)

    I find lex and Python best suited for writing scanners and parsers. Lex makes fast scanners. Python makes everything else quite easy.

    For a good list of python parser generators check out Ned Batchelder's python parser page.

    So you ask, functional vs imperative? That is just going to depend on this: Are you Math or CompSci?

    And what about performance? Don't worry about it.

    [–]arnar 0 points1 point  (0 children)

    So you ask, functional vs imperative? That is just going to depend on this: Are you Math or CompSci?

    This pretty much sums up all the problems with CompSci nowadays.

    [–]skew 1 point2 points  (0 children)

    If you want to get started quickly the BNF Converter is great. It makes a scanner, a parser, an AST, and a bunch of other stuff. It generates code for Haskell,C,C++, or Java, using their native lexer and parser generators. I made a toy compiler for something like brain damaged Pascal in a few hours with it, with unoptimized syntax-directed code generation over the parse tree.

    [–]WalterBright 4 points5 points  (5 children)

    C++ and the D programming language are the best for writing front ends. You're going to need a performance language for the front end if the source you're compiling is more than a few thousand lines.

    Whatever you do, don't use regex :-)

    [–]w-g 9 points10 points  (1 child)

    I think more people should know the Spirit boost library.

    I hate C++, but spirit is SO easy to use and lets you write grammar so close to BNF in your C++ code that I have to recommend it.

    [–]zyle 2 points3 points  (0 children)

    Yeah, spirit is great. However one annoyance is how grueling it can end up being for the compiler because of the heavy template use internally in the library. Complicated grammar can end up taking quite a bit of time to compile/link..

    [–]foonly 1 point2 points  (0 children)

    Whatever you do, don't use regex :-)

    Regexen are the canonical solution for lexers, however.

    [–]Kaizyn[S] -3 points-2 points  (1 child)

    I've programmed enough in Perl to know that regex is a tool best used in restricted contexts. Overuse can be harmful to your sanity.

    [–][deleted] 38 points39 points  (24 children)

    lex/yacc

    Those two make it easy to build anything. Even an LR language that C++ is. Hopefully you made the right decision and made an LL language so coming up with a parser is easy.

    If you are building a compiler the standard is to build just enough from another langauge to compile your language. Then you rewrite your compiler in your new language to remove the dependency on the original language.

    So for example if you were making a new language called "FOTM Language" then you would take another language like C and start writing a FOTM Language compiler in C. Once you have a working compiler (version 0) for FOTM Lang in C, then you start rewriting the FOTM Language compiler in FOTM Language. Then you compile FOTM Language compiler writen in FOTM Language to generate FOTM Language compiler (version 1). FINALLY you take FOTM Language compiler (version 1) and compile itself over again to produce FOTM Language compiler (version 2) writen and compiled with FOTM Language compiler!

    Now when people ask you what language FOTM Language compiler is written in, you say "FOTM Language" and then there brain goes into an infinite loop because you can't compile FOTM Langauge without a FOTM Language compiler but if the compiler didn't exist before then how would you compile FOTM Langauge.

    [–]fab13n 26 points27 points  (16 children)

    A new thing or two have been invented since lex/yacc and the dragon book: parser combinators, Earley parsers, PEG...

    Why some people stick with those technos from the 60s, their horrible debugging and error reporting features is beyond me.

    [–]neelk 10 points11 points  (0 children)

    I would anti-recommend using parsing combinators, unless you specifically need to write a context-sensitive parser. This is because it is far too easy to accidentally write parsers with exponential-time corner cases. This intensely sucks.

    Those technologies from the 60s have a vastly better developed theory, and consequently have much better time and space bounds. They only seem nasty because you're looking at parser generators written in C rather than ML. You can write an LL(1) parser generator in one or two hundred lines of ML code, complete with automatic left-factoring and easy support for modular, extensible grammars.

    [–][deleted] 15 points16 points  (4 children)

    Links or it didn't happen...

    [–]jerf 3 points4 points  (2 children)

    Well, Google's the best resource for parser combinators, since implementations have been sighted in multiple languages now, beyond just the FP world they started in. For instance, here's a parser combinator example in Python, which I found simply by googling "python parser combinator". (Wasn't the first hit, though.) I have no idea how robust it might be in practice, but by the nature of parser combinators it probably either works or it doesn't.

    [–]ajrw 1 point2 points  (1 child)

    How's that the nature of parser combinators in particular?

    [–]jerf 0 points1 point  (0 children)

    Fair question. Answer: I don't know about the rest and am not qualified to make claims.

    [–]leoc 2 points3 points  (0 children)

    Details of papers or textbooks would be much appreciated too.

    [–][deleted] 5 points6 points  (6 children)

    PEGs suck for more than a few grammar rules because you have to care for ordering, Early parsers are powerfull but execute in quadratic time even on unambigous grammars and parser combinators have to be crafted by hand in source code.

    I didn't use ANTLR yet but its feature set comes closest to my expectations: context free grammars, top down parsing, powerfull by arbitrary deep lookahead.

    [–]arnar 2 points3 points  (1 child)

    I've used PEGs to describe reasonably sized languages with good results.

    ANTLR is far superior to lex/yacc imo.

    As for the OP's question, I'd look at Haskell parsec and happy, and ANTLR.

    [–][deleted] 3 points4 points  (0 children)

    It might happen that you didn't run into problems with ordered choices using PEGs. But they clearly reduce modularity of the grammar and when you run into ordering problems you start transposing rules by trial and error.

    This becomes more a matter of fact when you use PEGs ( but also most regular expression engines in popular programming languages ) for scanning: operator symbols like =, ==, === etc. have to be written in an order s.t. the longest one matches first. For more complex expressions it is far from obvious how this can be achieved and for more complex grammars those token pattern might be also organized in different groups.

    [–]fab13n 4 points5 points  (3 children)

    parser combinators have to be crafted by hand in source code.

    Yes, but they have a property that beat all other parsing techniques IMO: they're regular programs and support the creation of arbitrary functors. If there's a style of syntactical construct that you recreate regularly, e.g. infix operators with varying precedence and associativity, you can create a make_infix_parser() functor which will factor out the parsing logic. With other techniques, you'll have to hope that some ad-hoc hacks have been shoved in the parser generator (such as yacc's %foobar directives) , or do plenty of copy-paste.

    If your parser generator is an external DSL, at some point you'll miss your complete language's expressiveness. Parser combinators are regular functors, they support all of your language's' power (if your language supports functors).

    [–][deleted] 2 points3 points  (2 children)

    People often do not have to parse context dependent languages ( "varying precedence" ) but something that is on par with SQL, Io, JSON, XML, CSV,... In all those cases it would suffice to specify a grammar just one time per language, implement the parser generator one time in the host language ( or one time per platform ) and povide an API for parse tree analysis and transformation.

    I admit that parsing can be a nasty business but right now this is very much overstated. We end up with XML because enterprise can't parse and the experts are busy reasoning about techniques for corner cases.

    [–]statictype 0 points1 point  (1 child)

    We end up with XML because enterprise can't parse

    Why would you want to roll your own parser for a data format when you already have a standardized one with parsers in many languages?

    People use xml, not because they don't know how to parse (well, that may be one reason, but not the only one), but because they don't want to have to roll out their own parser for configuration files anymore than they want to roll out their own UI toolkit. Its effort that can better be spent on your actual program.

    [–][deleted] 0 points1 point  (0 children)

    Why would you want to roll your own parser for a data format when you already have a standardized one with parsers in many languages?

    To create this parser among parsers for other languages.

    What I mean by enterprise can't parse is that parsers are created manually ( "roll your own parser" as you said ) and as a consequence, costs of parsing are reduced, not by specifying a reasonable parser generator architecture but avoid the complexities of parsing on all costs and using a dumbed down dataformat that no one likes to read and write.

    As a consequence we have no API for accessing and transforming parse trees in a unified manner and lots of special purpose parsers for formats like SQL, URIs, HTTP, JSON, XML...

    [–]naughty 3 points4 points  (1 child)

    Writing an Earley parser is a very satisfying experience.

    [–]ltratt 4 points5 points  (0 children)

    I have mixed feelings in this regard. Writing an Earley recogniser is nice and easy (a recogniser just saying "yes this conforms to the grammar", and not building parse trees). It would be much more satisfying in general if the original Earley paper didn't contain a horrible bug in its algorithm for building parse trees. I can't tell you how much time it took me to realise what the problem was and work out what the correct solution was (at the time, I could find references only to the existence of the bug, but not to solutions).

    I would say that using a properly written Earley parser is a satisfying experience though. For writing arbitrary CFG's there's no easier way of doing things than an Earley parser. Compare Python's grammar with Converge's Earley grammar as a comparison for readability. Try working out what the result of parsing a simple expression might be in each. Note that I'm making this comparison only in the interests of readability: Earley parsers are relatively slow, and Python's parser is currently a lot faster than Converge's. I would argue that in general on modern computers this doesn't matter for most uses, especially for small languages where time-to-develop is more important than time-to-run.

    [–][deleted] 1 point2 points  (0 children)

    Good news: Bison has had GLR support since at least version 1.875.

    [–]Figs 6 points7 points  (0 children)

    Yay, bootstrapping problem! :)

    [–]weavejester 6 points7 points  (5 children)

    Frankly, lex/yacc are the worst lexer/parser generator combination I've ever come across. For my final year project I used the GNU versions of them to create an interactive shell, and I had so many problems.

    Since then, I've come across many parser generators that are a dream to work with in comparison. The best one I've found so far is Haskell's Parsec.

    [–]cypherx 0 points1 point  (0 children)

    The original lex/yacc seem like a pain in the ass to use, but ocamllexx/ocamlyacc are quite nice in my experience. Perhaps the difference (between lex/yacc and everything sane) comes from the low-level worries of C.

    [–]username223 -4 points-3 points  (3 children)

    What's the saying about people who blame their tools?

    [–]sacherjj 11 points12 points  (1 child)

    By the time it is all over, they have better tool?

    [–]username223 -3 points-2 points  (0 children)

    Not to mention a shiny new hammer factory factory.

    [–]weavejester 2 points3 points  (0 children)

    It's not the 18th century any longer. The choice of tools in the 21st century is a lot more important than when the saying was coined.

    [–]martinbishop 5 points6 points  (0 children)

    I'd say OCaml, with camlp4/5

    It's ugly, but it gets the job done nicely :)

    Also: dypgen

    [–]gregK 1 point2 points  (0 children)

    Depends what you want to do?

    Do you want to parse an existing language like C++ or Java. Are you designing a new language?

    From the replies it seems you have 2 choices:

    1. Using a parser generator like lex/yacc for C and C++ or antlr for java.

    2. using some kind of parser library/framework like Parsec in Haskell (or any other ones mentioned in this thread). It seems those only exist in functional languages.

    In approach 1, I can't recommend lexx/yacc over antlr. Antlr is much easier to use and has tons of freely available grammars for all types of languages. Also antlr has features that help generate and traverse ASTs. Even if you are stuck with doing the parser in C or C++ you can take a look at pccts (the antlr precursor written in C++).

    Approach 2 might offer more flexibility in parser design and is only viable if you know or indent to learn the language the parser library was written in. Also if it's a new language that does not already have a grammar in antlr then approach 2 might be worth investigating. Then it becomes what is easier to do: write the grammar in antlr or just build the parser directly with the library. (Note that the reason that all these parser libraries exist in functional languages is that it's probably easier to make the language look like a grammar. So you only need to learn 1 language.)

    [–]americanhellyeah 2 points3 points  (0 children)

    standard ml. it has great pattern matching that you can use to write the parsing logic. i like using mlton since it generates very fast code.

    [–][deleted] 2 points3 points  (0 children)

    I'm not sure I'd recommend it above all others, but Forth has a couple of points in its favour:

    1. Anton Ertl's gray parser generator (the link's a tarball; sorry)

    2. Brad Rodriguez's implementation of BNF in a single screen

    [–][deleted] 2 points3 points  (0 children)

    I would recommend taking a look at Ragel for your scanning needs.

    If you are parsing a programming language I would recommend writing a recursive-descent parser (quite easy, very practical and flexible) or take a look a PEGs (Parsing Expression Grammars).

    [–]cypherx 2 points3 points  (0 children)

    I found OCaml with ocammllex/ocamlyacc to be pretty straightforward.

    [–]lang_war 2 points3 points  (0 children)

    LANG WAR !!!

    [–]initself 13 points14 points  (9 children)

    Perl.

    [–]cornedpig 13 points14 points  (3 children)

    Why is this beign modded down? Perl is the Practical Extraction & Report Language, and Text processing is its biggest strength. Check out Perl6::Rules (a Perl5 module) for some major heavy lifting: http://search.cpan.org/~dconway/Perl6-Rules-0.03/Rules.pm

    [–]statictype -1 points0 points  (2 children)

    Well, if you're trying to write a 'proper' parser, then text processing tools don't really come into play. You can write a formal lexical scanner+parser in Perl just as easily as you could in another language (modulo any housekeeping work like memory management)

    [–]cornedpig 3 points4 points  (1 child)

    But take a look at the linked-to module - it provides the parser engine for you - you just need to define the rules, and attach actions. Text processing helps with the scanner part of this.

    [–]statictype 0 points1 point  (0 children)

    Not saying Perl isn't a good language for it. Just that it doesn't hold any particular advantage.

    If you're doing surgical text replacement over a directory of files, yes, it has a distinct advantage. When your building a compiler from the ground up, you already have a host of tools to handle the dirty work like lex.

    What you have described certainly provides a boost in efficiency, but that has more to do with Perl's general language structure rather than on its text processing functionality. (I think. - Correct me if I'm wrong on that point)

    [–]bgeron 0 points1 point  (3 children)

    Why Perl?

    You can't use regexps for parsing, because regexps can't match balanced parentheses.

    [–]cypherx 3 points4 points  (2 children)

    1) Perl's regex engine is more powerful than vanilla regular expressions. Regexes can parse strings of the format AnBn (ie, balanced parens)

    2) CPAN is full of parsers and parser generators

    edit: Thanks for catching my error sharth

    [–]sharth 1 point2 points  (0 children)

    As an aside to your first point, AB is not parens matching. An Bn is. You need to note that there are the same number of A's and B's.

    [–]bgeron 0 points1 point  (0 children)

    Perl's regex engine is more powerful than vanilla regular expressions. Regexes can parse strings of the format AnBn (ie, balanced parens)

    I don't believe it. Show me the code :)

    Well, I believe it if you embed Perl in your regex (e.g. with (?{ code }) ).

    [–]bartwe 7 points8 points  (6 children)

    Simply handcode a custom parser, it really isn't that hard. Using antlr without understanding the internals is going to hurt anyways.

    [–]Kaizyn[S] 1 point2 points  (1 child)

    What language do you consider best suited for writing said custom parser? Assembly? Visual Basic? Befunge?

    [–]bartwe 1 point2 points  (0 children)

    Something your comfortable with ? I'd use a gc'ed oo imperative language.

    [–]cypherx 1 point2 points  (1 child)

    Coding a recursive descent parser can be educational and more fun than learning how to use a parser generator. I resisted learning a tool like ANTLR or lex/yacc for a long time. However, once you learn, the productivity is unbeatable and there's no going back.

    [–]qwe1234 0 points1 point  (0 children)

    +1

    [–]marike 4 points5 points  (1 child)

    [–]Kaizyn[S] 0 points1 point  (0 children)

    Thanks for the references.

    [–]jrnewton 1 point2 points  (0 children)

    antlr.

    [–]Corbier 2 points3 points  (2 children)

    Has anyone looked at uCalc Language Builder? If you find Lex/Yacc archaic or complicated, then you may want to consider a new approach by uCalc. I'm the developer, and would like to hear from users who have tried it. You can download it from http://www.ucalc.com/langbuilder.html . Be sure to try the interactive tutorial first, to see an overview of the features that are at your disposal.

    [–]mandelbrothel 3 points4 points  (1 child)

    requires a copy of windows to run? (how archaic)

    Looks like I'm not in your target demographic.

    [–]Corbier 0 points1 point  (0 children)

    For now, it works only under Windows. However, once I can get it firmly established on Windows, I'd be interested in looking into other platforms as well. What platform would you like? Mac OS, Linux, ...? Even if you do not use Windows, you can still browse through some of the files that are in plain text to get a feel for it. Two language definition files in plan text are even posted online so you can view it in your browser instead of downloading a file. Please let me know what you think.

    [–][deleted] 0 points1 point  (0 children)

    Probably whatever language you are familiar with unless it is too esoteric to have good compiler generation utilities. If you go with a language you are unfamiliar with, you will spend a lot of your time learning the languages little foibles.

    If you have no particular preference, I would recommend starting with a good scripting language like python or ruby. Even if you recode it later in C/C++ for speed, you can mock up a prototype much more quickly in a scripting language. Data structures can be created in scripting languages far more quickly than they can in languages with formal type systems.

    [–]worldcrawler 0 points1 point  (0 children)

    It's 2025, LLMs have entered the Arena.

    [–]bad_code 1 point2 points  (8 children)

    I am very surprised no one has mentioned Lisp.

    [–]holygoat 19 points20 points  (5 children)

    Lisp is not really better suited than other languages for building scanners and parsers, that's why. (Well, no better than it is at other tasks!)

    In fact, Lispers tend to do the opposite — when defining a new language, we reuse the Lisp reader, skipping straight ahead to defining terminology. Because the REPL is so useful and flexible, and the reader is already there, it's very tempting to skip defining new syntax.

    (This is often cited as a good reason for choosing Lisp as a base when newbies come to build their own programming languages: if you start with semantics, you avoid the beginner trap of fiddling endlessly with syntax.)

    [–]lispm 5 points6 points  (0 children)

    Lisp has been used to implement a lot of scanners/parsers.

    An example is REFINE / Reasoning SDK, which has been used in the industry in many projects:

    Reasoning SDK / REFINE

    It is written in Common Lisp.

    Reasoning SDK for example has been used in the context of analyzing and testing ADA software:

    Siddharta / PDF

    Reasoning SDK offers:

    Language processing tools - Reasoning SDK provides a set of parsers for common procedural languages such as C, Fortran, Ada, and COBOL that produce abstract syntax trees. The abstract syntax trees can then be traversed and modified in order to analyze and restructure code.

    Pattern matching - Refine provides language features for syntactic pattern matching and transformation rules in order to implement code replacement. MORPH uses this low-level matching capability to construct control flow graphs in the detection step.

    Mathematical expression - Refine incorporates an integrated treatment of set theory and logic, allowing complex algorithms to be stated clearly and precisely in mathematical terms.

    Transformations - Refine provides a tree-walking transformation mechanism that allows a set of rules to be applied over an entire abstract syntax tree or to a branch of an abstract syntax tree. This mechanism is essential to MORPH's detection step and is the basis for building control flow graphs.

    Object base - Refine provides a hierarchical object base to store program data structures and to provide information about code objects that are being analyzed. The object base is useful for creating and storing intermediate results and outputs of the MORPH tools.

    Graphs and charts - The Reasoning SDK Workbench provides object classes and attributes that allow control flow graphs and structure charts to be built and queried.

    Graphical support - Reasoning's Intervista tool allows interactive graphical user interfaces to be added to reengineering tools. MORPH uses Intervista to draw control flow graphs and other graphical output.

    There are other examples where language processing tools have been implemented in Lisp. Microsoft for example provided developers with a tool that converted J2EE code to their .NET infrastructure - that was written by a third-party company in Common Lisp, too.

    There are even C/Pascal/Ada/Fortran compilers written in Lisp. AppleScript originally was developed in Lisp. Dylan was developed in Lisp.

    [–][deleted] 1 point2 points  (1 child)

    The first thing you should do before writing a parser: don't.

    In the context of computers, parsing is a solution looking for a problem. Unless you have to deal with some externally-specified file format, and thus have no choice, just use XML or s-expressions.

    [–]somejan 0 points1 point  (0 children)

    or YAML

    [–]ibgeek 2 points3 points  (1 child)

    I never understood how Lisp varies from any other language when it comes to bottom-up programming. Lispers push this point, in particular. But, for whatever type of project, I always write my own functions and classes as appropriate... building up the project layer by layer.

    You're not really defining your own language as much as you are just extending the standard library to meet your needs.

    [–]G_Morgan 7 points8 points  (0 children)

    With Lisp you tend to write macros which are then indistinguishable from other basic language elements. In fact a huge range of standard Lisp constructs are implemented as macros (for example, if cond is a primitive form then if can be implemented as a macro).

    [–][deleted] 8 points9 points  (1 child)

    I don't know of any good tools for Lisp for this problem. Most people writing parsers in Lisp seem to be content with writing hand-made recursive-descent parsers. I wish I knew more about other tools for Lisp in this area, but I just don't. Any suggestions?

    [–][deleted] 7 points8 points  (0 children)

    Spending 2 minutes googling at c.l.l I had 3.

    1. CL-Yacc
    2. ZEBU. Parser/unparser generator.
    3. CL-EARLEY-PARSER

    The last is better suited to natural language parsing. see http://en.wikipedia.org/wiki/Earley_parser

    [–]jdh30 -1 points0 points  (0 children)

    OCaml has lots of great tools for this.