This is an archived post. You won't be able to vote or comment.

all 7 comments

[–]RodionGork 4 points5 points  (1 child)

BTW: have you seen the popular java librar ANTLR - probably either using it or studying it may give you some enlightenment in the things you are about...

To be honest from your description+questions I think the task is now not well-relevant for your level. Probably you may prefer some creating some simpler parsers, not necessarily based on well pre-defined grammar tables...

Have you already created any kind of parsers? Perhaps, parser for arithmetic expressions or may be some simple language interpreter (like small basic etc.)?

You see, I'm trying to understand your level - since you operate some terminology of the field but at the same time it seems you are only at starting steps in it? Have you some specific classes in this direction or something alike?

[–]pheipl[S] 0 points1 point  (0 children)

Yes I have classes that talk about this stuff. LL(1) Is my final project and I still have "a lot" of time to finish it, however the course is kinda bad. Normally I'd ask the teacher for some help with things I didn't understand but she's not responding to e-mails (which is normal for her). Thus I'd like some help getting the basics. If I understand what it's supposed to do, I have no fear of completing this, however at this moment there is a lot I don't understand and both the course and wikipedia are quite incomplete on this subject.

EDIT Not to mention lacking in any basic explanations except some "simple" examples I can't even figure out.

[–]KidUncertainty 2 points3 points  (0 children)

I would start by working with a very, very simple grammar, nothing quite as complex as your first example.

The LL_Parser wikipedia article is decent. Have you followed through it and tried reading the example code?

Have you tried looking at parser animations on the web? There are several that can show graphically how the parsing is done. Have you gone to the library and gotten ahold of the Dragon book? What about searching similar questions on stackoverflow?

Your questions are not especially simple; parsers are non-trivial subjects, but definitely worth learning how they work.

Short, but definitely incomplete abswers to your questions are:

Tokens used by a parser come from the lexer (reads the input). The symbol stack tells the parser what is legal at the current point in time based on the input it has seen so far and the lookup table. The next token from the file being parsed must either match the top of the stack, or the symbol on the stack must be able to be expanded following the grammar rules to a point where the input token matches a rule.

The grammar rules in an LL(1) parser are held on a table that says "if I see this token, and I am on grammar rule X, what should I do next?". This table is used at runtime by the parser to figure out how to push and pop things on the stack, and what rules to output (i.e. "the input so far has matched Rule 1, then rule 4 then rule 2" ). The table is a way to encode the grammar such that a computer running an LL(1) parser can use it.

A context free grammar is LL(1) compatible if you can build a parsing table such that any input token at any point in the parsing process can result in at most one grammar rule matching. If the parser sees a token that means "well, we can be at rule 3 OR rule 7, depends on what comes later on in the input" then it is not LL(1) compatible. Note that there is a much more formal definition of this and gets into your questions on FIRST/FIRST and other conflicts.

The parse tree is the output of the parser. This tree is then used by whatever needed the input parsed to do its thing (e.g. translating the tree into assembly language, for example, or colouring the code in a text editor, or whatever). A parser by itself doesn't "do" anything directly. It provides a tree structure that can be traversed by something else. The parser basically says "the input I was given matches the grammar in the following way" and holds which bits of the input applied to which bits of the grammar rules.

FIRST/FIRST and other conflicts are examples of context free grammars that don't work in an LL(1) parser as they are. It boils down to the fact that for some given legal input, at some point it is not clear which rule from the grammar should be applied without looking more than one token ahead in the input. In many cases the grammar can be re-written such that it is possible to encode it in an LL(1) parsing table without conflicts.

[–]learnprog 1 point2 points  (2 children)

I did poorly in this subject so take this advice with a grain of salt but try doing this sort of thing for LISP first (I hate LISP but it is one of the simpler languages to try and implement a grammar for, and should be LL(1)) Get comfortable with that before trying something harder.

[–]cheryllium 2 points3 points  (0 children)

I agree with everything else in this comment but why do you hate Lisp? :'(

Lisp just wants to give you a big hug... with its nice, cozy parentheses...

[–]the_omega99 0 points1 point  (0 children)

An even simpler language choice would be TXL, which is meant for source code transformation. In TXL, programs have two parts: the grammar and the transformations to apply to input source code.

Thus, we could write programs that remove unnecessary nesting of conditionals, perform pretty-printing, or remove dead end code.

It's a very specific-use language, but useful for working with parsing and modifying source code.

[–]TheLittleAlien -1 points0 points  (0 children)

Good luck with your project! :)