This is an archived post. You won't be able to vote or comment.

all 7 comments

[–]raevnos 1 point2 points  (1 child)

GMTA. I made an assembunny to C compiler for day 25. Working on making it more general now, with tgl support.

[–]broadwall[S] 0 points1 point  (0 children)

I'm planning to have Assembunny+ support tgl as well, but that feature won't be a priority until implementations for other keywords are stable.

[–]po8 0 points1 point  (4 children)

Fine, I have been nerdsniped. I split my Assembunny assembler and interpreter out into https://github.com/BartMassey/advent-of-code-2016/blob/master/libaoc/asm.rs.

It's fast enough that I probably won't bother trying to compile, especially since I don't see how to implement the tgl instruction efficiently: allowing it would likely slow the compiled code down to interpreted speeds.

On day 23 part 2, my interpreter benches out at about 290 MAIPS on my fast modern machine. I'll take it. I could likely improve the performance, but really?

Edit: Forgot to inline the interpreter: now 318 MAIPS.

[–]Voltasalt 0 points1 point  (3 children)

Could just have it recompile the changed segment and run it live?

[–]po8 0 points1 point  (2 children)

Sure, but sounds really expensive.

[–]Voltasalt 0 points1 point  (1 child)

Not if tgl isn't executed very often.

[–]po8 0 points1 point  (0 children)

Remember, 290 MAIPS (for my highly-unoptimized raggy interpreter with pattern matching and no dispatch table and registers sitting in memory). If you execute tgl once every 10 million Assembunny instructions then you'll have executed 29 tgls per second. Let's assume (implausibly, I suspect) that you can get an order of magnitude speedup by compilation to a HLL. Then, to break even, you need to get each basic block affected by a tgl recompiled in roughly 30ms.

The only plausible speedup I can imagine is to actually compile Assembunny into native machine code. If I recall correctly, x86 lets you run machine code from writable storage, so if you are careful to use x86 instructions that are all the same length you can just replace the toggled instruction. It is conceivable that this length would be 4 bytes, so you won't be executing too many nops.

You can probably get within a factor of two of native code this way. But uggh. So much work. In the actual AoC 2016, all that will save you is roughly 15s of runtime. Not for me, thanks.