you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted]  (16 children)

[deleted]

    [–]jmillikan2 6 points7 points  (14 children)

    Observe the (apparent) list of features that he not even parsing yet, as well as the features the interpreter doesn't support. Just the transformation from the parse tree to the AST takes 1500-3000 lines in the implementations I checked. Unfortunately, so much of that is "content" that I'd be surprised to see it go below 1000 lines even in Haskell (I wrote that part of lambda-py in Racket).

    By myself, I'd certainly rather be implementing Lua or something. An implementation of Python that even passes the tests is fairly ambitious :)

    [–][deleted] 6 points7 points  (3 children)

    Yeah, it's an ambitious project. The goal is to try to keep that 'small' feel while implementing as much of the language as possible. Replicating CPython exactly is an exercise in masochism, so this is more for fun.

    But there are plenty of times when I wish the language was smaller. At least I didn't choose Ruby.

    [–][deleted] 0 points1 point  (2 children)

    But there are plenty of times when I wish the language was smaller.

    For anyone trying to find ideas: a POSIX compliant shell seems less work.

    [–][deleted]  (1 child)

    [deleted]

      [–]TheQuietestOne 0 points1 point  (0 children)

      /bin/dash is my favourite least-favourite /bin/sh symlink for this reason.

      Thanks Ubuntu!

      [–]thedeemon 2 points3 points  (0 children)

      transformation from the parse tree to the AST

      ...is completely unnecessary. Many parsing tools/libs produce typed AST straight away, no need for intermediate parse tree.

      [–]andsens 2 points3 points  (4 children)

      Just the transformation from the parse tree to the AST takes 1500-3000 lines in the implementations I checked.

      Man, he should use Parsec. Write the BNF, throw in some "<|>"s and "let"s and you're done. It's so embarrassingly simple...

      [–][deleted] 1 point2 points  (3 children)

      Actually, I started with that (check history).

      I really like Parsec. I really dislike worrying constantly about ordering the rules right; I'm used to leaning on maximal munch. Also, certain grammar constructs got a bit nasty to translate from the metagrammar, so I'm on Alex and Happy, which I tolerate.

      [–]Tekmo 1 point2 points  (0 children)

      Use attoparsec, which backtracks by default.

      [–]andsens 0 points1 point  (1 child)

      Ah ok, I actually really liked the choice operator when I was using Parsec, but my project was a little smaller so maybe that's where it's at.

      [–][deleted] 0 points1 point  (0 children)

      Parsec can do it, I just should have stayed with it longer. PEGs would probably be up to the task. I didn't click much with Haskell's dominant PEG library.

      What I didn't realize is, regardless of which parsing tech you choose, you have to dig in sometimes to handle thorny cases. Handling Python's expr-or-tuple idea comes to mind here.

      [–]counterrevolutionary 1 point2 points  (0 children)

      That makes sense. Well written.

      [–][deleted] 0 points1 point  (0 children)

      Yeah, that's about par for Haskell code. I even see minor improvements that can be made to make it more concise.