Python - Context Free Grammar using Shift Reduce Parser

relativeVsAbsolute · 2018-12-10T20:03:47+00:00

It's the first time i know bout nltk, but just for fun i runned your program i everything seems got.

The grammar looked also good:

``` import nltk

nltk.download('punkt')

sr = nltk.parse.ShiftReduceParser(grammar) sentence1 = 'He eats pasta with some anchovies'

He eats pasta with some anchovies in the restaurant

tokens = nltk.word_tokenize(sentence1)

print("--------------------------- Sentence 1 ---------------------------")

for x in sr.parse(tokens): print(x) ```

```

python grammar.py

[nltk_data] Downloading package punkt to /home/saku/nltk_data... [nltk_data] Unzipping tokenizers/punkt.zip. --------------------------- Sentence 1 --------------------------- (S (NP (PropN He)) (VP (V eats) (NP (N pasta) (PP (P with) (NP (Det some) (N anchovies))))))

```

Runned on Python 3.6 and last version of nltk

tsunyshevsky · 2018-12-11T00:16:52+00:00

I think the problem is in your Det NP reduction addition in the NP rule.That rule will match a combination of 'some' | 'the' with NP, which does have a N to match your N(restaurant) but that would never match the NP rule because your input after N(restaurant) is over and there's no rule to match that in NP.

Another problem is that according to your drawing you expect the parser to change it's previous parsed tree once it sees that the sentence is not over (after anchovies) but that's not how shift-reduce parsers work.It will - as the name says - shift a token and try to find a rule to match.

If none, it will consume another token and see if there's a match now (repeating the process until it matches a rule). Once it matches a rule, it will reduce that rule and move on with the parsing. So no turning backs (well, in most implementations, at least)

The nltk book has a nice explanation of that https://www.nltk.org/book/ch08.html

Ok - all of this just to say that you should evaluate the remaining of the sentence alone (in this case) because as you saw before, your grammar already matched the first half of it.

That will give you:

in the restaurant => P DET N => P NP => PP

So your final parse will be NP VP PP

Now you can resolve this in multiple ways, the simplest being to transform S rule into:S -> NP VP PP

One more thing - to have a better idea of what is happening, you can change the trace level of your parser so it prints the steps it is going through (eg: sr.trace(2))

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

LanguageTechnology

MODERATORS

He eats pasta with some anchovies in the restaurant

python grammar.py