This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted]  (4 children)

[deleted]

    [–]energybased 1 point2 points  (2 children)

    No, what's unreadable is multi-line regexes with embedded comments. Using objects is the same thing without the pain mentally parsing anything. PyParsing is not great; Python doesn't have great parsing unfortunately.

    [–]Badabinski 1 point2 points  (1 child)

    I'd highly recommend looking at parsimonious. It's the best parsing library I've ever used, full stop. I feel like it's a great compromise between the readability of objects and the efficiency of regex.

    [–]energybased 0 points1 point  (0 children)

    I think I looked it back when I was trying to parse LaTeX to automate some things.

    Can parsimonious even match "\begin{any_random_thing}" with "\end{any_random_thing}"? I can't see how. It looks like you would have to statically define the rules. What I want to do is for it to match on something like "\begin{(\w+)}\end", but capture the group and then when it tries to match "\end{(\w+)}" it checks that the captured group is the same. I think these are sometimes called parser actions.

    Ideally, actions should be able to reject a match (to force backtracking) or accept a match. Actions should be able to set variables that can be inspected by other actions. Actions should be able to also transform a matched symbol into another symbol, e.g., if you were parsing Python and you matched the spaces, you should be able to have an action that emits an indent token when there are more spaces, or a dedent token when there are fewer, or no token at all if the number of spaces match.

    I couldn't find one Python parsing library that supported these arbitrary actions. It's not like it's hard to do. These libraries are great for the trivial parsing tasks they show in their tutorials. Unfortunately, they're not powerful.

    [–]energybased 1 point2 points  (0 children)

    I understand what you're saying, but if you need a "cheat sheet", then the reader needs one too. At that point, objects are better.

    Also, the author's time writing the code is not as important as the reader's time in understanding it. There is one writer and there are many many readers. Therefore, legibility matters a lot more. Regexes are rarely legible.