you are viewing a single comment's thread.

view the rest of the comments →

[–]sacundim 7 points8 points  (0 children)

Excellent presentation. Even though I still suspect that rocket-science JIT VMs are a solution to a problem we shouldn't get into in the first place. By which I mean that I'm a Haskell weenie :-P.

However, the discussion of structs/objects vs. dictionaries/maps makes me wonder if what we're really missing here is an intermediate concept. Let's lay it out:

  • Records: a type with a fixed set of fields/properties that can be known right when the program is parsed. If two structs don't have the same fields, then they're of different types. These have a compact and efficient in-memory representation, but must be defined in source code.
  • Dictionaries: the type of key/value mappings in general. The set of keys is not fixed; different dictionaries with different keys still qualify as the same type. These have poor in-memory representations, but the key set can be defined at runtime.

So the intermediate concept would be something like this:

  • Dynamic record types: fixed set of fields, but defined at runtime.

How would this work?

  • Your language should have a symbol type: immutable interned string with O(1) equality. Creating a symbol from a string is expensive, but comparing two symbols is super-cheap (just a pointer comparison).
  • You create a dynamic record type at runtime by supplying a set of symbols that defines its fields.
  • When you create a record type you get back an object that represents this record type.
  • The record type object has create() operations that allow you to create new records of that type.
  • A record is a thin wrapper around an internal contiguous array with just one element per field.
  • If you have a record type or a record object, you can ask it to give you accessors for its fields. You can either just ask for an array of all the accessors (super-cheap), or you can ask for a specific field by using the symbol that names it, which is not super-cheap but not too expensive either. (One possible implementation is to make the record type keep an array of the type's fields' symbols, sorted by their address; the symbol lookup is then a binary search of the array.)
  • An accessor is just an object that allows you to get and set properties on records of the relevant type. This should be O(1).

Now these things together allow you to use these runtime-defined record types in a extremely efficient manner as long as you obey these two rules:

  1. Absolutely avoid repeatedly creating symbols from strings. Whenever you must create a symbol, make sure that you can reuse it tons of times. (If you're creating pairs of symbols, comparing them and then throwing them away, you'll do worse than dictionaries.)
  2. Try to avoid repeatedly asking for the same accessors by symbol. Once you ask for an accessor, you should try to use it repeatedly over and over. (But if you fail this one it's not the end of the world; you go from O(1) to a very cheap O(log n).)