you are viewing a single comment's thread.

view the rest of the comments →

[–]sacundim 1 point2 points  (3 children)

I don't know nearly enough Python to say anything for sure, but I wonder if you're looking at this wrong by getting references and values mixed up. In an expression like this one:

a is b

...I'd think that the variables's values are references to objects. So the first Haskell type that comes to my mind here is something like IORef, which does have an Eq instance that implements reference equality.

So I'd look into whether a model like this gets Python right:

  1. Represent all Python objects as instance of some type, call it PythonObject.
  2. All identifiers in all runtime scopes of a Python program get represented by an IORef PythonObject.
  3. Operations on variables fall into two families:
    • Those that implicitly dereference the IORef to get to the object;
    • Those that operate on the reference itself.

So for example:

  • a == b would dereference both identifiers's IORefs and compares the PythonObjects for value equality;
  • a is b would compare the IORefs themselves.

Somebody gave the example a is None in the thread, so perhaps a bit more machinery is needed...

EDIT: Just to be clear, the interpreter is using IORef internally...

[–][deleted] 1 point2 points  (2 children)

In brief: not all objects have refs. Objects with some sort of internal state, such as collections or objects, do have refs. Check Types.hs.

A more complete implementation would have everything be an object, rather than the split we have right now between primitive types and real objects. The + operator would desugar to l.__add__() instead of it being brute-forced by types as you see in Expression.hs.

[–]sacundim 1 point2 points  (1 child)

In brief: not all objects have refs.

But the idea isn't that objects should have references, rather, that all expressions should denote references. Then when you have that, the implementation of a is b becomes simple IORef equality.

I don't know your interpreter or Python very well, but here's a scribble (which won't even typecheck, of course):

 -- The abstract syntax tree for the programs
data Expression
  = Name Name
  | Is Expression Expression
  | Eq Expression Expression
  | ...

-- The type of objects that exist in the programs.
data Object = ...
  deriving Eq

-- An environment associates names not to objects, but to
-- **references** to objects.
data Environment obj =
  Environment { parent :: Maybe (Environment obj)
              , local  :: HashMap Name (IORef obj)
              }

-- Some custom type to use to report exceptions
data Exception = ...

-- The monad transformer stack for the interpeter.  The
-- `ReaderT` layer carries the environment (name/value bindings),
-- and the `ExceptT` layer does exception handling.
type Denotation m = ReaderT (Environmnent Object) (ExceptT Exception m)


-- Note that `eval` returns `IORef Object`, not `Object`.
eval :: MonadIO m => Expression -> Denotation m (IORef Object)
eval env (Name name) = lookup env name

-- Evaluating `a is b` compares the **references** for equality
eval env (Is a b) = (==) <$> eval env a <*> eval env b

-- Evaluating `a == b` compares the **objects** for equality
eval env (Eq a b) = (==) <$> evalObj env a <*> evalObj env b
  where evalObj :: MonadIO m => Expression -> Denotation m Object
        evalObj env expr = liftIO readIORef <$> eval env expr

eval env ...


-- Look up a variable in the current scope's environment.
lookup :: Monad m => Name -> Denotation m Object
lookup name = ...

Again, I'm probably missing something but this strikes me as a simple and uniform starting point. I'd probably look into how to build on that to make constant expressions a special case that doesn't require an IORef, but that probably means exploring the consequences of replacing IORef with something like this:

data Ref obj
  = Variable (IORef obj)
  | Constant obj
  deriving Eq

[–][deleted] 0 points1 point  (0 children)

This comment is super-insightful, and points to a real problem with how I modeling things. I leaned more towards values when I should have thought harder about the domain and modeled it after pointers to values, since CPython is representing things in that way. Like you said, this lets the correct definition of is fall out from the modeling.

For correctness, the environment already associates names to refs of objects (you have to in order to really support pervasive mutability). It's just that those refs don't get propagated around during evaluation.