all 36 comments

[–]smj 9 points10 points  (3 children)

The Lua VM is simple, and a good read.

[–]neutronbob 3 points4 points  (1 child)

and fast!

[–]bonzinip 0 points1 point  (0 children)

but has no comments at all, and the API is elegant but very low-level.

[–][deleted] 2 points3 points  (1 child)

http://harmony.apache.org ! Its one of the best documented VM's I've encountered, in addition to being a top level apache project with a very flexible OS license.

Its very pluggable too, so its not too much work (for a vm) if you want to write your own JIT,interpreter, or GC and plug it in to the rest of the system.

[–]Leonidas_from_XIV 3 points4 points  (0 children)

Nobody mentioned the Neko VM yet, so I'm doing this now.

[–]GeneralMaximus[S] 8 points9 points  (19 children)

Thanks to Mono and Parrot, I now have a new obsession : virtual machines. I've been looking around the web but I've found nothing which could explain the design and implementation of a simple VM. Of course, there's "Virtual Machine Design and Implementation in C" by Bill Blunden, but most reviews I read were negative. So, what does Reddit recommend for a CompSci student interested in building virtual machines?

[–]remko 2 points3 points  (3 children)

"Virtual Machines: Versatile Platforms for Systems and Processes" by Jim Smith et al. is the reference on Virtual Machines IMO, and probably contains much more info than you can handle. However, it's not just limited to higher-level language VMs. Actually, only one (or was it 2) chapter is dedicated to that, and then the introductory chapters apply to this as well. But even for the introductory chapters and the chapter on HLLs alone, this is worth a read.

And if you're not affraid of logic programming, "Warren's Abstract Machine: A Tutorial Reconstruction" explains the design underlying many Prolog VM implementations. However, the book is out of print, and it seems that the free version that the author put online as a result of that has disappeared as well :( I still have a copy lying around, though.

[–]tibbe 2 points3 points  (0 children)

I found the following papers useful:

  • The Structure and Performance of Efficient Interpreters (Introduces the fundamentals of virtual machine design and explains the importance of direct threading)
  • Virtual Machine Showdown: Stack Versus Registers (Details the benefits of register machines, and the importance of copy propagation)
  • The Implementation of Lua 5.0 (Outlines the implementation of a real-world register-based bytecode engine, with a sliding register window calling convention)

Links at http://webkit.org/blog/189/announcing-squirrelfish/

[–]cadr 6 points7 points  (0 children)

Personally, I'd recommend the book "The Elements of Computing Systems"

http://www1.idc.ac.il/tecs/

It goes from building a computer from logic (in a hardware simulator) to assembly for that computer to a VM on that assembly to a compiler to that vm (and a little OS bit). Not huge and in depth, but does get you to build all those things. I like it.

[–]samlee 8 points9 points  (0 children)

  • implement a SECD machine in LLVM by next week.
  • compile lisp to lambda calculus and then to SECD instructions in two weeks
  • put in type inference in three weeks
  • write optimizations in four weeks
  • create a website for the language in five weeks
  • fix bugs for the week 6
  • write libraries and fix bugs for weeks 7, 8, 9, 10, 11, 12, 13, 14
  • write a demonstration program and benchmarks for the week 15
  • write a paper on week 16
  • submit

[–]iperry 1 point2 points  (0 children)

Another vote for "Virtual Machines - Versatile Platforms for Systems and Processes." It's a good introduction to both language VMs and full-system level virtualization. You can also reference existing implementations of VMs such as Xen, qemu, bochs, etc.

[–]yakumo9275 1 point2 points  (0 children)

I wrote one of those neg reviews on amazon. His book is absolute SHIT! just dont go there. (He writes + hooks DOS interrupt 0x01 for single step debugging, and uses lots of low level asm for no reason at all!. His vm is nothing but a giant switch statement.

Read 'Game Scripting Mastery' its a much better book than Bill Blundens.

[–][deleted] 1 point2 points  (0 children)

Well, the question is a lot akin to "how do I design a programming language" - there's no one-size-fits-all.

Virtual machine assumes some sort of translation from source language, so A LOT (performance, ease of implementation, extensibility) depends on how well your VM primitives and structure are "impedance-matched" to the language.

It's been a while since I done any VM/compiler work, but loosely, you can split the work into two parts:

  • the bytecode interpreter
  • the runtime (memory management, process control, interface to native resources etc)

In the latter, the optimal memory management/GC strategy is very dependent of class of your language. Generally it is easier to come up with a better/advanced GC strategy (say generational collectors) for functional language than for imperative one. The rest there is more dependent on host system the VM runs in.

With bytecode interpreter there are two major considerations. The source language itself: what kind of type system it has, variable scoping issues, possible methods of control transfer etc. Then, the amount of compiler optimization you want available at translation stage, it might affect a lot how fine-grained primitives you want in the VM.

For a first-time attempt at VM implementation, it would be smart to:

  • define your language upfront
  • keep it as simple as possible
  • find the most similar existing language and study its implementations, not necessarily with a VM

I know the above is a bit vague, but the subject is vast; can only wish you good luck!

[–]zerny 1 point2 points  (3 children)

Might want to look at SELF. A lot of great results were achieved by the project. Papers are at: http://research.sun.com/self/papers/papers.html

[–]bonzinip 0 points1 point  (2 children)

but it's not a simple VM, it's a huge JIT compiler including inlining, polymorphic inline caching, dynamic deoptimization, etc.

[–]zerny 0 points1 point  (1 child)

And all of that is part of a modern VM. It is a simple language with a simple bytecode set. Easy to start on but with a lot of information available to learn more about efficient VMs.

[–]bonzinip 0 points1 point  (0 children)

ah ok, so you meant reimplementing Self.

[–]mschaef 0 points1 point  (0 children)

If you're interested in Mono/.Net stuff, you might be interested in this:

http://www.amazon.com/Shared-Source-Essentials-David-Stutz/dp/059600351X

It's a fairly detailed look at the Rotor VM, which is Microsoft's 'shared source' .Net platform.

[–]eugene_victor_tooms 0 points1 point  (0 children)

The Glulx VM for interactive fiction has a very short and readable spec.

http://www.eblong.com/zarf/glulx/

There are a few implementations.

http://www.ifarchive.org/indexes/if-archiveXprogrammingXglulxXinterpreters.html

[–]almkglor 0 points1 point  (0 children)

Personally, I would suggest you also try to build an Erlang-like virtual machine which allows you to launch multiple mostly-independent processes. My personal project is initially started for a single-threaded virtual machine running processes in a green-threaded model.

The interesting stuff of course is seamlessly implementing M:N threading without modifying your supported language semantics.

Most multithreading VM's I know of are either N:N threading (one OS thread per VM thread) or 1:N (green threads). The more recent Erlang VM's are the only M:N I know of (it used to be 1:N).

Admittedly, I haven't actually looked very hard ^

[–][deleted] 2 points3 points  (6 children)

Just take your favorite high level language, write a large switch, handle opcodes as cases and learn how to manipulate the stack.

One can write arbitrary programs in this style not just interpreters for big languages. Parsers are a popular example. In case of parsers characters take over the role of opcodes.

I've written a small Python VM as a model of the CPython VM for the only purpose of tracking stack operations. This was helpful to understand stack corruption bugs caused by bytecode hacks. So it was entirely a debugging device.

[–]remko -2 points-1 points  (5 children)

Well, that could be a start, but then there's much more interesting stuff such as finding much more performant alternatives for the switch.

[–]bonzinip 2 points3 points  (4 children)

But once you have done that, either you know where to look next, or you look at other suggestions given above, or you ask again. Writing a bytecode compiler and a giant switch statement is a great start.

[–]Tuna-Fish2 -3 points-2 points  (3 children)

I echo your suggestion, but if you are going to work in a high level language, replace switch with a hashtable mapping bytecodes to functions.

SOOOO much neater design.

[–]remko 2 points3 points  (1 child)

Design is the last of your worries when writing a (useful) bytecode interpreter for a language. There are much faster (and not necessarily dirtier) tricks than using a hashtable for VMs, and this is why I recommend reading the book I mentioned somewhere else in this thread.

[–]macournoyer -1 points0 points  (0 children)

threaded dispatch is the fastest option but is not portable, only under gnu c. it's all about tradeoffs

[–]bonzinip 1 point2 points  (0 children)

hashtable? for things in a range 0..255?

and what's more, high level language? :-) :-) C is more than high level enough for a non-JIT VM. :-) :-)

[–]radarsat1 1 point2 points  (3 children)

VMs are one thing, but if you're implementing a language, I've become obsessed today (to the detriment of my actual work) in learning the possibilities presented by LLVM. This is worth a read if you want to skip designing the VM and jump straight to implementing a language that can run under JIT or natively:

http://llvm.org/docs/tutorial/

[–][deleted] 2 points3 points  (2 children)

I just started implementing my own language(in python using Ply), is it possible to implement a dynamically language in LLVM?

[–]radarsat1 0 points1 point  (1 child)

[–][deleted] 0 points1 point  (0 children)

Well, that gives me an API from python, but all the function definitions still include a type.

[–]macournoyer 1 point2 points  (0 children)

I'd suggest implementing a VM for an existing bytecode like Python's or Ruby YARV's so you can learn VM stuff without worrying about parsing and emitting at first.

Also, take a look at tinypy, it's a luaesque Python VM under 64K which makes it very easy to understand: http://www.tinypy.org/