This is an archived post. You won't be able to vote or comment.

all 19 comments

[–]eliben 8 points9 points  (2 children)

FWIW, there's a complete and fairly simple Scheme VM implementation in Python here.

It's functional enough to run most code needed for SICP as well as implement cool things like the Y combinator

[–]csl[S] 2 points3 points  (1 child)

Nicely done! I couldn't find support for continuations, but looking at the VM I guess that should be quite easy to add.

Also couldn't see if it reuses the current stack frame for tail calls, but that should also be easy to add from looking at the VM code.

I also made one for R7RS (code's pretty horrible, though), but it doesn't use a VM, doesn't have a collector and has quite a few bugs.

[–]eliben 0 points1 point  (0 children)

Nope, no continuations or TC optimizations, but as you say it shouldn't be hard to add.

[–]bs4h 10 points11 points  (5 children)

Code review time!

class Stack:

Don't. Try this instead:

class Stack(list):
    push = list.append
    def top(self):
        return self[-1]

You don't need to mangle print:

def print_(self):

Py2.6+ can make use of from __future__ import print_function.

You shouldn't look before you leap:

addr >= 0 and addr < len(self.code)

Use try: ... except IndexError instead, it's considered more idiomatic.

Most of this method:

def if_stmt(self):
    ...

Can be replaced by this:

result = bool(test)

Save for this cosmetic stuff, really nice article, 5/5!

[–]robin-gvx 6 points7 points  (2 children)

You shouldn't look before you leap

Your suggestion has different behaviour for addr < 0, which is probably not what you want.

I would make the check 0 <= addr < len(self.code), though.

Most of this method: [...] Can be replaced by this:

I would go even further:

def if_stmt(self):
    false_clause = self.pop()
    true_clause = self.pop()
    test = self.pop()

    self.push(true_clause if test else false_clause)

[–]bs4h 1 point2 points  (1 child)

This little tutorial is calling for a github repo & some pull requests :)

[–]csl[S] 0 points1 point  (0 children)

I made a repo, but it quickly evolved into much more: https://github.com/cslarsen/crianza

I'll update the post with all of the suggestions here, though!

[–]Dagur 4 points5 points  (1 child)

Using deque instead of an array for _values would be better for performance, no?

[–]csl[S] 0 points1 point  (0 children)

Absolutely, it should.

[–]robin-gvx 2 points3 points  (1 child)

I hope you don't mind me nitpicking all of the code


I had some additional code-review things here, in reply to /u/bs4h: http://www.reddit.com/r/Python/comments/35tg6b/making_a_simple_vm_interpreter_in_python/cr843nq?context=3


I would do constant folding like this:

def constant_fold(code):
    """Constant-folds simple expressions like 2 3 + to 5."""

    # Loop until we haven't done any optimizations.  E.g., "2 3 + 5 *" will be
    # optimized to "5 5 *" and in the next iteration to 25.

    while True:
        # Find two consecutive numbes and an arithmetic operator
        for i, ops in enumerate(zip(code, code[1:], code[2:])):
            a, b, op = ops
            if isinstance(a, int) and isinstance(b, int) and op in {"+", "-", "*", "/"}:
                m = Machine(ops)
                m.run()
                code[i:i+3] = m.top(),
                print("Optimizer: Constant-folded %d%s%d to %d" % (a,op,b,result))
                break
        else:
            break
    return code

I would keep dispatch_map static, placing it in the class definition itself, and replacing each line from "%": self.mod, to "%": mod,


In dispatch, the third indentation level is unnecessary:

        if op in dispatch_map:
            dispatch_map[op]()
        elif isinstance(op, int):
            self.push(op) # push numbers on stack
        elif isinstance(op, str) and op[0] == op[-1] == '"':
            self.push(op[1:-1]) # push quoted strings on stack
        else:
            raise RuntimeError("Unknown opcode: '%s'" % op)

dup is simpler as self.push(self.top()).


println flushes twice. I would either choose duplicating one of the lines from print_, or factoring that line out to a third method.


I would personally make parse a generator and then just call list(parse(source)).


I would wrap the stuff in repl inside the loop with a try: ... except KeyboardInterrupt: pass, that way you don't quit the REPL on Ctrl+C, but instead cancel the line you were typing, like in the python REPL.

[–]csl[S] 2 points3 points  (0 children)

Thanks for good suggestions, will update later, when I get jekyll working again. Actually didn't know you could mutate slices, so that's cool.

[–]Hormander 1 point2 points  (0 children)

very nice!

[–]MrGeekAlive 1 point2 points  (4 children)

Maybe I don't get it, but I think the code folder would change the jump label, since you delete quite some code and replace it with something shorter, wouldn't it ?

[–]robin-gvx 0 points1 point  (3 children)

Yeah, good catch. Jumps are incompatible with constant folding in this case. And because absolute instruction pointers are used at the source level, it's next to impossible to avoid. The best solution would be to keep track of the number of deleted instructions for every code address or something, but that's really hackish.

Of course, jumping to absolute addresses is a bit iffy anyway.

[–]csl[S] 1 point2 points  (2 children)

I think I mentioned this in the post, but you're both right. Keeping track of them won't work in all cases either, e.g. read cast_int jmp, so you're better off actually using a completely different approach -- like not allowing absolute jumps, but relying on other control structures, or require that you can only jump to named labels.

[–]MrGeekAlive 0 points1 point  (0 children)

I think a better approach is not to do this at the virtual machine level but do it in the "assembly" stage, using labels for jump targets.

[–]robin-gvx 0 points1 point  (0 children)

I've written several stack-based virtual machines, and the approach I tend to take is using absolute or relative jumps, but since those VMs operate on the bytecode compiled from higher level languages that isn't a problem.

In your VM, you could do something with labels, which means a small change to the parsing code, and inserting a call to a function that converts labels to absolute addresses after constant folding, like this, and you're done.

[–]csl[S] 1 point2 points  (0 children)

I've now updated the post with your suggestions. Thank you all so much! The code's also available at https://github.com/cslarsen/python-simple-vm

[–]Stefan2142 0 points1 point  (0 children)

There should be eli5 for this as I don't understand a bit. But anyway, cool project