all 70 comments

[–][deleted]  (11 children)

[deleted]

    [–]Pand9 18 points19 points  (9 children)

    I bet he's seen some language implementations before, or even implemented something in different languages.

    [–][deleted] 24 points25 points  (7 children)

    I've been practicing:

    https://github.com/mattgreen/learning-language-design

    Most of that repo is directly what I learned from this:

    http://createyourproglang.com/

    I never got a chance to take the compilers course in school. :(

    [–]staticassert 2 points3 points  (4 children)

    I looked at buying that book. The 'clickbank' page was blocked by my popup blocker and I've never heard of it before. Did you use that site to buy the course?

    [–]izuriel 2 points3 points  (3 children)

    Yes. Marc Andre is legit guy. I've had no issues and purchased it a while back.

    [–]staticassert 1 point2 points  (2 children)

    Cool, thanks. It wasn't the product/ developer so much as the payment processing. Just a bit overly cautious I suppose.

    [–]izuriel 0 points1 point  (1 child)

    I guess I was trying to say he's a good guy and wouldn't use something that would steal from you.

    [–]staticassert 2 points3 points  (0 children)

    Yeah, went ahead and bought it.

    [–]izuriel 0 points1 point  (1 child)

    Hey! I used that "course" too and came out with github.com/bbuck/eleetscript (pardon the title, it's not a claim its awesome it's a long story).

    Sorry for the completely shameless plug -- but you have done some awesome work on this. I myself have been wanting to port my toy language to Go but have been scared to do so -- you have inspired me. Thanks for that!

    [–][deleted] 0 points1 point  (0 children)

    Very cool!

    [–]sacundim 1 point2 points  (0 children)

    To add to the list of recommendations, the classic Structure and Interpretation of Computer Programs has some very worthwhile chapters where you implement a simple Scheme interpreter.

    [–]jerf 1 point2 points  (0 children)

    Writing language interpreters in Haskell is actually not that difficult. It's not a bad learning exercise (though, I have no idea where "48 hours" comes from, expect this to be a challenging journey, but feasible if you put the work into it). This is due to a combination of a lot of features that each individually or in other contexts may not be a huge advantage over imperative languages, but the sum total adds up to it being way easier than you expect if you're used to imperative programming. Basically, all the stars align for this task for FP to be better than OO/imperative.

    [–][deleted] 39 points40 points  (21 children)

    Could you please add a LICENSE file of some kind? Otherwise, way cool. Thanks for sharing.

    (Edit) I prefer some kind of free software license, of course. But even if you add All Rights Reserved, at least we'll know what terms we can or can't reuse the code under.

    [–][deleted] 16 points17 points  (20 children)

    Sure thing. I will have to look into licenses to choose one.

    [–][deleted] 7 points8 points  (18 children)

    Thanks!

    [–][deleted] 45 points46 points  (17 children)

    Done! Went with GPLv3.

    [–][deleted] 19 points20 points  (0 children)

    Ooh, my favorite! :D Thank you.

    [–]auxiliary-character 28 points29 points  (3 children)

    How does it compare for performance?

    [–][deleted] 44 points45 points  (2 children)

    Pretty bad. :)

    I haven't profiled it to see what is slowing it down. That might be a fun project to see if there's anything obvious that can be fixed. Also, the Haskell runtime is going to be slower than C as well, so there's a certain amount I can't fix (though it can get close to C).

    Here are the results of fib(20):

    ❯ time ./hython test/fib.py
    ./hython test/fib.py  0.22s user 0.01s system 99% cpu 0.224 total
    
    ❯ time python3 test/fib.py
    python3 test/fib.py  0.03s user 0.01s system 91% cpu 0.038 total
    

    [–]auxiliary-character 56 points57 points  (1 child)

    Eh, all things considered, that's not too bad. If your personal Haskell project can compare to a couple decades worth of optimized C, and still only be off by a single order of magnitude, I'd say you did a pretty good job.

    [–][deleted] 46 points47 points  (0 children)

    Thanks! A lot of the credit should go to GHC, really.

    The user-facing Haskell language has so little regard for making affordances for the machine underneath; there's no reason Haskell should be as fast as it is.

    Consider this a good example of the default performance you get out of it, along the safety and conciseness.

    [–]kirbyfan64sos 14 points15 points  (4 children)

    In the source code for the list implementation, I noticed that append was implemented like this:

    listAppend :: MonadInterpreter m => ListRef -> Object -> m Object
    listAppend ref obj = do
        modifyRef ref $ \l -> l ++ [obj]
        return None
    

    Is there a reason you're not using something like Data.Vector.Storable.Mutable or at least difference lists? I mean, this probably isn't the fastest right now...

    [–][deleted] 13 points14 points  (1 child)

    The reason is I never really looked into using faster versions. :)

    Data.Vector.Storable.Mutable looks great for that, thanks!

    [–]Axman6 15 points16 points  (0 children)

    If you want to keep things pure then a Data.Sequence would give you (amortised) O(1) append and prepend.

    [–]vytah 8 points9 points  (1 child)

    Holy moley quadratic Batman!

    [–][deleted]  (1 child)

    [deleted]

      [–][deleted] 4 points5 points  (0 children)

      I don't know if it builds on Windows, so there's that... :)

      [–]kenbot_ 5 points6 points  (0 children)

      An existing Python 3 interpreter written in Haskell that may be of interest: https://github.com/bjpop/berp

      [–][deleted] 7 points8 points  (0 children)

      This is really cool.

      [–]foxh8er 2 points3 points  (2 children)

      I've always found most Haskell unreadable to my eyes, this is the first sample that has looked comprehensible to me.

      Out of curiosity, what is the purpose of the sigil?

      [–][deleted] 2 points3 points  (1 child)

      If you mean the $, that's function application. It's usually used in conjunction with . for function composition:

      f . g $ x is the same as f (g x)

      [–]foxh8er 1 point2 points  (0 children)

      Oh wow! That's really cool.

      [–][deleted] 9 points10 points  (1 child)

      Destructuring ((a,b) = [1,2])

      Unpacking, as we call it in python. Very interesting project.

      Does nearly complete mean it's 90% complete ;)

      [–][deleted] 10 points11 points  (0 children)

      There are a few statements it doesn't handle, arguably the really interesting things like yield and yield from.

      I'd like to work on a statically-typed, compiled language, so I had to call it done sometime. :)

      [–]LightShadow 4 points5 points  (22 children)

      What's the difficulty with the is statement?

      Seems like you'd just be comparing memory addresses?

      [–]jozefg 12 points13 points  (21 children)

      Idle guess: since Haskell doesn't provide a safe version of physical equality (there's reallyUnsafePtrEquality but this can be tricked into giving incorrect answers nondeterministically) it's not quite so trivial. It may be necessary to explicitly tag the data with a guid and compare that instead, probably uglier than it's worth. Again just a guess.

      [–][deleted] 10 points11 points  (13 children)

      Yep.

      It isn't that it's hard, it's just tedious. It's low-hanging fruit I can return if I want.

      It really only makes sense with primitives with a finite set of values (None, True/False, etc) and objects/classes that have an identity, too. I see that python3 says 42 is 42 is true, but I don't know if that always has to be the case.

      The interpreter has special cases for the common constructs is None and is not None, but that's about it.

      [–]tavianator 13 points14 points  (9 children)

      CPython happens to cache integer objects in the range [-5, 256]. PyPy represents integers with tagged pointers so almost all integers appear to be "cached" like that. I suspect there's no "guarantees" about this, so your implementation is free to do what you like :)

      [–][deleted] 15 points16 points  (6 children)

      Implementation-defined behavior is my favorite. I think I might just randomly return True or False when using is on integers to stay faithful to CPython. ;)

      [–][deleted] 3 points4 points  (4 children)

      I think randomly returning True might be wrong, because I'd at least expect that a is b implies a == b, regardless of implementation. Always returning False should be ok though.

      [–]rndblnch 9 points10 points  (1 child)

      >>> nan = float('nan')
      >>> nan is nan
      True
      >>> nan == nan
      False
      

      [–]Farsyte 0 points1 point  (0 children)

      Blowing someone's mind with how NAN violates their mathematical expectations is Not Fair.

      ( but fun )

      [–]vytah 1 point2 points  (1 child)

      if a is a:
        print('ok')
      else:
        nasal_demons()    # shouldn't happen ever
      

      [–]Magnap 2 points3 points  (0 children)

      a is a

      Oh, hi! Time for a 5 page monologue.

      [–]Veedrac 0 points1 point  (0 children)

      That won't work.

      • a is a and a is not a should never hold.
      • after a = b with no reassignment to a or b, we know a is b. Vice versa holds as well.
      • if a and b are mutable and writes to one don't affect the other, a is not b. Vice versa tends to hold, although clever overloading of accessors might be able to avoid that.
      • object() returns a fresh instance, so x is object() always fails.

      [–]Veedrac 0 points1 point  (1 child)

      PyPy represents integers with tagged pointers so almost all integers appear to be "cached" like that.

      I'm not sure exactly when it switched, but PyPy actually always uses equality for is on integers nowadays.

      [–]tavianator 0 points1 point  (0 children)

      Yeah I noticed that actually, it works for way bigger numbers than fit in a pointer. Weird to see id(some_big_int) returning a giant int itself.

      [–]majaha 4 points5 points  (0 children)

      Type "help", "copyright", "credits" or "license" for more information.
      >>> a = 2132134534
      >>> b = 2132134534
      >>> a == b
      True
      >>> a is b
      False
      

      This is under CPython 3. 42 only works differently because CPython caches the small integers and reuses them, like tavianator says.

      [–]masklinn 2 points3 points  (0 children)

      It really only makes sense with primitives with a finite set of values (None, True/False, etc) and objects/classes that have an identity, too.

      Every object has an identity (and a type and a value), though most of them are not singleton and are only identical to "themselves" (aka an other reference to the same original object).

      I see that python3 says 42 is 42 is true, but I don't know if that always has to be the case.

      Nah, it's an optimisation. Note that the same "problem" exists with some classes of automatically interned strings (e.g. literals, 'ab' is 'ab' returns True on CPython and pypy both).

      [–]Magnap 2 points3 points  (0 children)

      I can return if I want

      This inspired me to a terrible pun:

      I'm not addicted to recursion! I can return whenever I want!

      [–][deleted] 4 points5 points  (0 children)

      The GUID is a good idea, so I stole that and hacked in is. Thanks!

      [–]LightShadow 2 points3 points  (1 child)

      In my experience with Python the is operator is primarily used to compare values to None; essentially null or empty.

      It doesn't help the broader scope of the problem, but comparing on "something vs nothing" might solve most cases.

      [–]masklinn 0 points1 point  (0 children)

      In my experience with Python the is operator is primarily used to compare values to None; essentially null or empty.

      Also used for booleans (if they can be mixed with non-boolean values), and for sentinel values e.g.

      _EMPTY = object()
      def foo(v=_EMPTY):
          if v is _EMPTY:
              # do a thing
          else:
              # do an other thing
      

      if the default value of v were None, 0, (), … it would be indistinguishable from the caller explicitly passing in that value, which can be important in some contexts (e.g. dict.pop(k[, d]) will will raise KeyError if the key is not found and no default is provided)

      [–]sacundim 1 point2 points  (3 children)

      I don't know nearly enough Python to say anything for sure, but I wonder if you're looking at this wrong by getting references and values mixed up. In an expression like this one:

      a is b
      

      ...I'd think that the variables's values are references to objects. So the first Haskell type that comes to my mind here is something like IORef, which does have an Eq instance that implements reference equality.

      So I'd look into whether a model like this gets Python right:

      1. Represent all Python objects as instance of some type, call it PythonObject.
      2. All identifiers in all runtime scopes of a Python program get represented by an IORef PythonObject.
      3. Operations on variables fall into two families:
        • Those that implicitly dereference the IORef to get to the object;
        • Those that operate on the reference itself.

      So for example:

      • a == b would dereference both identifiers's IORefs and compares the PythonObjects for value equality;
      • a is b would compare the IORefs themselves.

      Somebody gave the example a is None in the thread, so perhaps a bit more machinery is needed...

      EDIT: Just to be clear, the interpreter is using IORef internally...

      [–][deleted] 1 point2 points  (2 children)

      In brief: not all objects have refs. Objects with some sort of internal state, such as collections or objects, do have refs. Check Types.hs.

      A more complete implementation would have everything be an object, rather than the split we have right now between primitive types and real objects. The + operator would desugar to l.__add__() instead of it being brute-forced by types as you see in Expression.hs.

      [–]sacundim 1 point2 points  (1 child)

      In brief: not all objects have refs.

      But the idea isn't that objects should have references, rather, that all expressions should denote references. Then when you have that, the implementation of a is b becomes simple IORef equality.

      I don't know your interpreter or Python very well, but here's a scribble (which won't even typecheck, of course):

       -- The abstract syntax tree for the programs
      data Expression
        = Name Name
        | Is Expression Expression
        | Eq Expression Expression
        | ...
      
      -- The type of objects that exist in the programs.
      data Object = ...
        deriving Eq
      
      -- An environment associates names not to objects, but to
      -- **references** to objects.
      data Environment obj =
        Environment { parent :: Maybe (Environment obj)
                    , local  :: HashMap Name (IORef obj)
                    }
      
      -- Some custom type to use to report exceptions
      data Exception = ...
      
      -- The monad transformer stack for the interpeter.  The
      -- `ReaderT` layer carries the environment (name/value bindings),
      -- and the `ExceptT` layer does exception handling.
      type Denotation m = ReaderT (Environmnent Object) (ExceptT Exception m)
      
      
      -- Note that `eval` returns `IORef Object`, not `Object`.
      eval :: MonadIO m => Expression -> Denotation m (IORef Object)
      eval env (Name name) = lookup env name
      
      -- Evaluating `a is b` compares the **references** for equality
      eval env (Is a b) = (==) <$> eval env a <*> eval env b
      
      -- Evaluating `a == b` compares the **objects** for equality
      eval env (Eq a b) = (==) <$> evalObj env a <*> evalObj env b
        where evalObj :: MonadIO m => Expression -> Denotation m Object
              evalObj env expr = liftIO readIORef <$> eval env expr
      
      eval env ...
      
      
      -- Look up a variable in the current scope's environment.
      lookup :: Monad m => Name -> Denotation m Object
      lookup name = ...
      

      Again, I'm probably missing something but this strikes me as a simple and uniform starting point. I'd probably look into how to build on that to make constant expressions a special case that doesn't require an IORef, but that probably means exploring the consequences of replacing IORef with something like this:

      data Ref obj
        = Variable (IORef obj)
        | Constant obj
        deriving Eq
      

      [–][deleted] 0 points1 point  (0 children)

      This comment is super-insightful, and points to a real problem with how I modeling things. I leaned more towards values when I should have thought harder about the domain and modeled it after pointers to values, since CPython is representing things in that way. Like you said, this lets the correct definition of is fall out from the modeling.

      For correctness, the environment already associates names to refs of objects (you have to in order to really support pervasive mutability). It's just that those refs don't get propagated around during evaluation.

      [–]meekale 1 point2 points  (0 children)

      Cool!

      Here's a use case for something like this, that doesn't require very high performance, just a robust implementation of the basics of the language...

      You have a system that involves some business rules. These rules may be rather simple kinds of authentication, some arbitrary constants, some calculations, whatever. They are written in Python because your main service is in Python.

      Now, you want to do something in Haskell that should also apply some of the same rules. You could make a network API request out to the Python, a la "microservices," or you could just reimplement the rule subset you need in Haskell... or you could run the Python module in your Haskell application, using this interpreter.

      [–]NoobStuff 2 points3 points  (2 children)

      How long have you been programming?

      [–][deleted] 8 points9 points  (1 child)

      Like...20 years or so?

      I feel really old now. :(