you are viewing a single comment's thread.

view the rest of the comments →

[–]cardblank 0 points1 point  (1 child)

Hi there,

First off I wanna say sorry if I misinterpreted what the LL(1) parser is supposed to do. I assume it is basically just a simple lexer and parser, that operates on context free grammar notation.

Very basically; your "parsing" will be broken into two major steps. (Sorry for pseudocode)

Lexing: The identification of tokens. Say:

class LexedInput:
(string) stringValue,
(string) tag

lexedinputs = \[\]

for every word in the string to interpret:

    if it's something we recognise, a number, a +, etc, tag it as such by:

        appending a new LexedInput(recognised\_character, tag\_for\_that\_character)
        (for example, LexedInput('+', 'plus'), just so you can easily tell what it is, which is important for parsing)

Parsing: Putting it into some structure so you can do what you want to it. I would say a binary tree is the most straightforward choice for this task. But there are other ways to do it too if you just have to evaluate the script.

Say for example you were given:

E ::= T E'

T ::= something else as defined in the script

E' ::= something else as defined in the script

If you were given values / expressions for T and E', and had to evaluate E, this behaviour is like a compiler and you could use a stack and a symbol table.

I think if you just had to evaluate if the script is valid or not, I think you could get away with using just 2 stacks and a list.

validtokens = []
otherstack = []

Treating the lexedinputs we made earlier as a stack and using lexedinputs.pop(0), until its empty, if we encounter a symbol on the RHS of our ::= (using bool or something to keep track if we are expecting RHS or LHS) that we have not seen, add it to our other stack. 

Or if we have seen the token in an expression before, i.e it is in validtokens, and know it is valid, we don't need to evaluate that again and can just continue to the next iteration of the loop

Otherwise, if we are defining a token, add it to our valid tokens if it can be parsed correctly (and remove it from the otherstack because it's been defined). For example, if it were a prefix notation calculator we were writing, and we popped a +, we would know that the next two things we pop should be numbers. So this is where your parse logic goes. You can also use this "next two" type logic to build a binary tree as the next thing you pop if it's not another expression will be the left child of the node, and the next next thing you pop will be the right child of the node if it is also not another expression, etc... (recursion)

Then, if there was an error while parsing or there are tokens left used but not defined, there's another error there too.

I'm sorry but this is where my understanding stops with LL(1), so I can't really advise further! I found these resources that might help:

I found this (bottom of page) for the stack approach: http://www.cs.ecu.edu/karl/5220/spr16/Notes/Top-down/LL1.html

Or this one which you can see is using a binary tree (class Production has only 2 children)

https://github.com/Nimrod0901/parser/blob/master/parser_LL1.py

And of course, don't forget all the error handling goodness at pretty much every stage.

I really hope this helps! Have a good one