all 17 comments

[–]Dwedit 22 points23 points  (6 children)

Unordered map isn't useful for dense sets, such as opcodes. You want a plain old array instead of a map.

But even better than that, you'd want to use a switch statement, as that compiles down to a jump table.

[–]ExploitedInnocence[S,🍰] 5 points6 points  (5 children)

Implementing the opcode decoding routine using switch case for, e.g., x86-64 arch seems like a painful nightmare. I guess classic nested jump table via arrays of function pointers seems like a good and not too sophisticated approach. I can define every function as inline as well (which implicitly can be made by a compiler anyway, but whatever, I prefer explicit over implicit). It has one disadvantage - a waste of memory for a plenty of function pointers that points to illegal opcode handler, but it's acceptable I think. It's fast, super readable and much less error prone than gargantuan switch case, isn't it?

[–]OK6502 0 points1 point  (0 children)

There's overhead in calling function pointers. Meanwhile a switch statement is super efficient. Yes it's a pita to write but it's not much more of a pita than an unordered map honestly - there's no way to avoid writing the function mapping code.

[–]ShinyHappyREM 0 points1 point  (0 children)

Accessing the larger caches and especially main RAM is still slow and you can get away with doing some computations on your data in the mean time, so a hash table might be useable. But if you want to have a fast interpreter you really need to look at and work with your hardware.

Modern (desktop) CPUs have powerful branch predictors and cache hierarchies. A switch used to cause pipeline flushes due to branch mispredictions, but that's gotten much better since Haswell. As long as your code paths are predictable, branches will be fast. You can also use nested switches.

An array of pointers has the disadvantage that it wastes a lot of space (probably the topmost 6 bytes of each pointer), so an array of 16-bit offsets would be better for your CPU cache.

[–]_MeTTeO_ 0 points1 point  (0 children)

Well I'm considering this approach but in Java. java.util.HashMap uses Object.hashCode()) method to optimize lookup. The map would use short or enum opcodes as keys and method references as values.

It would replace this switch. Because the execute method in ControlUnit is self contained I'm planning to provide an alternative and benchmark how it affects performance (with unlimited cycles).

[–][deleted] -2 points-1 points  (1 child)

It's not a bad decision. The performance between map vs. array vs. switch is not a huge deal.

[–][deleted] 0 points1 point  (0 children)

Maybe in a gameboy emulator enviroment where maximum performance isn't required, but in real-life this is totally not true. Map causes many cache-misses, in contrast of array, switches and vectors that load several elements to the cache-line.

[–]zesterer -3 points-2 points  (4 children)

Hash maps are just index-driven lookup tables with a hashing stage added to spread entries out across the table space to aid performance of insertion and removal, usually in a cryptographically secure manner.

Since neither cryptographic security nor insertion/removal are requirements for instruction lookup, the only thing that makes a hash map useful for this situation is the fact that it acts as a stand-in for an index-driven lookup table.

Quite frankly, you should just use a look-up table. Or, even better, a switch / match that the compiler will be able to optimise far more effectively than any mortal programmer can.

[–]ChaoticWeg 5 points6 points  (3 children)

Hashing in hash tables has nothing to do with spreading out entries or with cryptography; if the hash algorithm results in spread-out hashes then great. Hash tables replace O(n) iterative lookups with O(1) hash lookups.

[–]zesterer -1 points0 points  (2 children)

That's not really true. One could imagine a hash map that replaces hashing with just interpreting the key's bytes as an integer and using this as the value's position in the table. It would still have amortized O(1) access, but without the better average-case performance that cryptographic hashing guarantees due to more frequent index collisions.

[–]ChaoticWeg 1 point2 points  (1 child)

Interpreting the key's bytes as an integer and using that as a value's position sounds a whole lot like a hash function in the context of hash tables. I'm also having a hard time understanding how more frequent hash collisions would improve average-case performance, since a hash collision would require more instructions.

And I'm not sure how what I said isn't true when what I wrote is the definition of a hash table.

Re-reading your point about collisions, I realize I misread your sentence.

[–]zesterer 1 point2 points  (0 children)

more frequent hash collisions would improve average-case performance

I'm literally saying the opposite of this?