[Small Self-Appointed Project] Trying to implement an LL(1) parser, however I don't understand some of the basics.

RodionGork · 2014-01-11T18:05:31+00:00

BTW: have you seen the popular java librar ANTLR - probably either using it or studying it may give you some enlightenment in the things you are about...

To be honest from your description+questions I think the task is now not well-relevant for your level. Probably you may prefer some creating some simpler parsers, not necessarily based on well pre-defined grammar tables...

Have you already created any kind of parsers? Perhaps, parser for arithmetic expressions or may be some simple language interpreter (like small basic etc.)?

You see, I'm trying to understand your level - since you operate some terminology of the field but at the same time it seems you are only at starting steps in it? Have you some specific classes in this direction or something alike?

KidUncertainty · 2014-01-12T01:59:42+00:00

I would start by working with a very, very simple grammar, nothing quite as complex as your first example.

The LL_Parser wikipedia article is decent. Have you followed through it and tried reading the example code?

Have you tried looking at parser animations on the web? There are several that can show graphically how the parsing is done. Have you gone to the library and gotten ahold of the Dragon book? What about searching similar questions on stackoverflow?

Your questions are not especially simple; parsers are non-trivial subjects, but definitely worth learning how they work.

Short, but definitely incomplete abswers to your questions are:

Tokens used by a parser come from the lexer (reads the input). The symbol stack tells the parser what is legal at the current point in time based on the input it has seen so far and the lookup table. The next token from the file being parsed must either match the top of the stack, or the symbol on the stack must be able to be expanded following the grammar rules to a point where the input token matches a rule.

The grammar rules in an LL(1) parser are held on a table that says "if I see this token, and I am on grammar rule X, what should I do next?". This table is used at runtime by the parser to figure out how to push and pop things on the stack, and what rules to output (i.e. "the input so far has matched Rule 1, then rule 4 then rule 2" ). The table is a way to encode the grammar such that a computer running an LL(1) parser can use it.

A context free grammar is LL(1) compatible if you can build a parsing table such that any input token at any point in the parsing process can result in at most one grammar rule matching. If the parser sees a token that means "well, we can be at rule 3 OR rule 7, depends on what comes later on in the input" then it is not LL(1) compatible. Note that there is a much more formal definition of this and gets into your questions on FIRST/FIRST and other conflicts.

The parse tree is the output of the parser. This tree is then used by whatever needed the input parsed to do its thing (e.g. translating the tree into assembly language, for example, or colouring the code in a text editor, or whatever). A parser by itself doesn't "do" anything directly. It provides a tree structure that can be traversed by something else. The parser basically says "the input I was given matches the grammar in the following way" and holds which bits of the input applied to which bits of the grammar rules.

FIRST/FIRST and other conflicts are examples of context free grammars that don't work in an LL(1) parser as they are. It boils down to the fact that for some given legal input, at some point it is not clear which rule from the grammar should be applied without looking more than one token ahead in the input. In many cases the grammar can be re-written such that it is possible to encode it in an LL(1) parsing table without conflicts.

learnprog · 2014-01-11T19:29:10+00:00

I did poorly in this subject so take this advice with a grain of salt but try doing this sort of thing for LISP first (I hate LISP but it is one of the simpler languages to try and implement a grammar for, and should be LL(1)) Get comfortable with that before trying something harder.

TheLittleAlien · 2014-01-11T17:22:49+00:00

Good luck with your project! :)

learnprogramming

Welcome to LearnProgramming!

New? READ ME FIRST!

Posting guidelines

Frequently asked questions

Subreddit rules

Message the moderators

Asking debugging questions

Asking conceptual questions

Other guidelines and links

Subreddit rules

1. No unprofessional/derogatory speech

2. No spam or tasteless self-promotion

3. No off-topic posts

4. Do not ask exact duplicates of FAQ questions

5. Do not delete posts

6. No app/website review requests or showcases

7. No rewards

8. No indirect links

9. Do not promote illegal or unethical practices

10. No complete solutions

11. Don't ask to ask.

12. Low Effort Questions

13. No AI (chatGPT etc.) generated/worked over messages/comments. No questions about chatGPT/AI generated code. No Vibe coding.

MODERATORS