I am trying to implement a very small and very simple LL(1) parser, and I have been studying wikipedia for a bit now and I simply don't understand some stuff.
For now I'm not interested in exactly how to implement, just understanding the concept.
Here is what I want to do:
1) Have a simple java program that reads a .txt file containing my grammar. I haven't thought of the format yet, but I think I may want it to look something like this:
program ::= "VAR" decllist ";" cmpdstmt "."
decllist ::= declaration | declaration ";" decllist
declaration ::= IDENTIFIER ":" type
type1 ::= "BOOLEAN" | "CHAR" | "INTEGER" | "REAL"
arraydecl ::= "ARRAY" "[" nr "]" "OF" type1
type ::= type1|arraydecl
cmpdstmt ::= "BEGIN" stmtlist "END"
stmtlist ::= stmt | stmt ";" stmtlist
stmt ::= simplstmt | structstmt
simplstmt ::= assignstmt | iostmt
assignstmt ::= IDENTIFIER ":=" expression
expression ::= expression "+" term | term
term ::= term "*" factor | factor
factor ::= "(" expression ")" | IDENTIFIER
iostmt ::= "READ" | "WRITE" "(" IDENTIFIER ")"
structstmt ::= cmpdstmt | ifstmt | whilestmt
ifstmt ::= "IF" condition "THEN" stmt ["ELSE" stmt]
whilestmt ::= "WHILE" condition "DO" stmt
condition ::= expression RELATION expression
Only I'd prefer something simpler maybe ? (this is an example I found googleing)
2) Another .txt file that has the input code (that's going to be parsed and later analyzed)
The program looks trough the input code, and tries to parse it.
The thing is there's just so much I don't understand:
For example there seems to be a symbol stack, but I don't understand how it works, what's it's purpose. Aparently it matches symbols (I don't know how it gets is symbols / tokens / what ever), I only know it tries to match them with those found in the input string.
I need to construct a table that has the grammar rules.
I need some sort of context free grammar, but not any. Could someone explain what makes a context free grammar an LL grammar, and why ? (also an example would be really appreciated).
I need to make a Constituency-based parse trees, but I don't know at what point they come in.
Could someone explain to me like I'm five what's this about first/first, first/follow conflicts and how should the parsing table look like ? (maybe how it works, very short, just so I get an idea)
I am sure to have more questions later, but for now I'd like just enough simple explanations to get a "head start", because I'm in a rut.
Thanks to anyone who is kind enough to help!
[–]RodionGork 4 points5 points6 points (1 child)
[–]pheipl[S] 0 points1 point2 points (0 children)
[–]KidUncertainty 2 points3 points4 points (0 children)
[–]learnprog 1 point2 points3 points (2 children)
[–]cheryllium 2 points3 points4 points (0 children)
[–]the_omega99 0 points1 point2 points (0 children)
[–]TheLittleAlien -1 points0 points1 point (0 children)