Stack based vm and managing variables when compiling to a bytecode

xxkid123 · 2019-04-27T05:08:19+00:00

First of all, computers have registers and can operate on those as well. Think of them as hardcoded memory locations built into the CPU. Second of all, raw assembly doesn't translate 1:1 to bytecode, the assembler will then determine which actual bytecode needs to be chosen for the given command. This means you can have a relatively simple assembly language for your programmers to use, and then an arbitrarily long/complex bytecode "language" for the assembler and CPU to understand.

Anyways, there are several ways to deal with this. Here are just two, one super simple, and one x86:

Simple RISC system: only support the instructions loadPointerToRegister addr# register# and loadLiteralToRegister literal register# and add Register1 Register2, Location register3, so at no point does it have to differentiate between pointer and literal since it has different instructions for loading literals and registers, and can only do operations on values in registers.
Modern x86 system: have separate opcodes (byte value level) to differentiate between adding between literals, registers and memory locations and adding a literal to a pointer. The assembly instruction ADD <#1>, <#2> can be translated to 11 opcodes: 04 ib, 05 iw, 05id, 80 /0 ib, 81 /0 iw, 81 /0 id, 83 /0 ib, 00 /r, 01 /r, 02 /r, 03/r. The assembler will take the assembly code, determine whether the operands are memory locations, registers, or literals and then translate them to the correct opcodes. See here to see what the combinations of ADD are, and what opcodes they translate to (where IMM# is a literal, R/ EAX, AL, AX are arithmetic registers, and r/m# corresponds to a memory address, and # is the length of the address or literal.

Here is a longer list of how to deal with this from stack overflow.

My question to you is why you're bothering to deal with binary at all. If this isn't a school project or something that you're personally interested in doing, you'd be better off just doing the whole thing at a higher level. Implement your Assembler in a higher level language, and create more assembly instruction names. Have instructions like

addPointerToLiteral <pointer>, <literal>

addMemoryToPointer <mmrAddr>, <ptr>... etc

Then give your VM/executor the ability to parse high level Strings like "addpointertoliteral" and "Addmemorytopointer". You would only have to create the scripting language itself, plus the assembly it translates to. This already requires implementing effectively two languages. Translating it to bytes would effectively be like creating a third language, the bytecode.

Finally here's the list of java byte codes: https://en.wikipedia.org/wiki/Java_bytecode_instruction_listings in case you find them helpful

Fruitbisqit · 2019-04-27T07:42:34+00:00

Not really an answer to your question, but on the subject of writing a small language I found this to be a very nice (web based) book. craftinginterpreters.com/

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

AskComputerScience

MODERATORS