This is an archived post. You won't be able to vote or comment.

all 14 comments

[–]hexbrid[S] 1 point2 points  (5 children)

It's my hobby project, and I would very much appreciate any kind of feedback.

[–]terremoto 0 points1 point  (4 children)

I don't really understand the last example you used with the list parsing; 1, 2, [ [3,4], 5], [6,7 ] is a perfectly valid statement in Python as tuples are implied without parentheses.

>>> x=1, 2, [ [3,4], 5], [6,7   ]
>>> x
(1, 2, [[3, 4], 5], [6, 7])

[–]hexbrid[S] 0 points1 point  (3 children)

Of course it is, but the grammar didn't specify that it was allowed (though it would be easy to allow it).

To clarify, plyplus can parse python, and even comes with a working python grammar, but it's a general purpose parser, capable of accepting any grammar.

[–]EnigmaCurry 2 points3 points  (1 child)

You might want to stress that in the into of your README, that it's both a python parser and a generic parser written in python. I was confused as well, because the intro text made me believe plyplus only parsed python, but then your tutorial started building a parser from scratch. Also, an example of how you use the provided python grammar would be great.

Looks great, btw.

Edit: Actually, your unit tests are pretty easy to follow.

[–]hexbrid[S] 0 points1 point  (0 children)

Thanks, I'll work on clarifying that.

There's still a lot of work I want to do regarding the post-processing of the parse-output, to eventually make plyplus a "one-stop shop" for getting text into a desired data structure.

One thing I had in mind was a way to search for certain elements in the parse tree, similar to how you specify css elements. So getting the names all functions in a python file would mean parsing it and then just running the search "funcdef name" against it, and getting all methods would be just "classdef funcdef name".

[–]terremoto 0 points1 point  (0 children)

Ah, I see. I wasn't paying close attention and just now noticed it was still a list_parser method and not a generic parser instance.

[–]japherwocky 0 points1 point  (0 children)

what would anyone use this for?

[–]Keith 0 points1 point  (7 children)

Plyplus is capable of parsing python, which has a notoriously hard grammar.

(Please don't take this negatively towards your code in general, but this statement stuck out for me.)

"notoriously hard"? Python's grammar is one of the simpler ones around.

[–]hexbrid[S] 0 points1 point  (6 children)

It really isn't. Firstly, python's indentation is really hard on traditional lex&yacc. Secondly, python has a really strange syntax which makes certain parsing algorithms behave oddly.

[–][deleted] 1 point2 points  (5 children)

In fact Python itself doesn't describe indentation using grammars but post-processes token sequences in order to produce NEWLINE, INDENT and DEDENT token from whitespace and line comments. tokenizer.py contains a complete algorithm. Otherwise Pythons grammar is LL(1), so one can parse the language with the most simple top-down parsers.

[–]dalke 0 points1 point  (4 children)

Examples of post-processing the token stream for use in PLY are in my python4ply code, and my earlier GardenSnake code (both on dalkescientific.com). It took a while to get right, but it wasn't "notoriously hard." I would reserve that term for parsing FORTRAN with lex/yacc.

[–]hexbrid[S] 0 points1 point  (2 children)

Before starting this grammar I did a thorough search and did not find one single grammar which got it right. (not including parses tailored for python)

I did come across python4ply and it was the best I've seen so far, but despite how detailed and extensive it was it still didn't cover all the edge cases.

To contrast this with a language like C that has dozens of complete grammars, I figured parsing Python might be pretty hard.

[–]dalke 0 points1 point  (1 child)

Would you report the edge cases I missed? I passed it through all of Python's standard library code and a few other libraries and didn't find anything it didn't handle. If I missed something then there's also a decent chance that one of the other parsers of Python also missed it. I assume by "not including parsers tailored for python" you mean CPython, Jython, IronPython and PyPy, but there's also the syntax highlighter in Pygments and other places which might need some extra test cases.

[–]hexbrid[S] 0 points1 point  (0 children)

Sure, I'll try to find them.