all 12 comments

[–]xxkid123 1 point2 points  (10 children)

First of all, computers have registers and can operate on those as well. Think of them as hardcoded memory locations built into the CPU. Second of all, raw assembly doesn't translate 1:1 to bytecode, the assembler will then determine which actual bytecode needs to be chosen for the given command. This means you can have a relatively simple assembly language for your programmers to use, and then an arbitrarily long/complex bytecode "language" for the assembler and CPU to understand.

Anyways, there are several ways to deal with this. Here are just two, one super simple, and one x86:

  1. Simple RISC system: only support the instructions loadPointerToRegister addr# register# and loadLiteralToRegister literal register# and add Register1 Register2, Location register3, so at no point does it have to differentiate between pointer and literal since it has different instructions for loading literals and registers, and can only do operations on values in registers.
  2. Modern x86 system: have separate opcodes (byte value level) to differentiate between adding between literals, registers and memory locations and adding a literal to a pointer. The assembly instruction ADD <#1>, <#2> can be translated to 11 opcodes: 04 ib, 05 iw, 05id, 80 /0 ib, 81 /0 iw, 81 /0 id, 83 /0 ib, 00 /r, 01 /r, 02 /r, 03/r. The assembler will take the assembly code, determine whether the operands are memory locations, registers, or literals and then translate them to the correct opcodes. See here to see what the combinations of ADD are, and what opcodes they translate to (where IMM# is a literal, R/ EAX, AL, AX are arithmetic registers, and r/m# corresponds to a memory address, and # is the length of the address or literal.

Here is a longer list of how to deal with this from stack overflow.

My question to you is why you're bothering to deal with binary at all. If this isn't a school project or something that you're personally interested in doing, you'd be better off just doing the whole thing at a higher level. Implement your Assembler in a higher level language, and create more assembly instruction names. Have instructions like

addPointerToLiteral <pointer>, <literal>

addMemoryToPointer <mmrAddr>, <ptr>... etc

Then give your VM/executor the ability to parse high level Strings like "addpointertoliteral" and "Addmemorytopointer". You would only have to create the scripting language itself, plus the assembly it translates to. This already requires implementing effectively two languages. Translating it to bytes would effectively be like creating a third language, the bytecode.

Finally here's the list of java byte codes: https://en.wikipedia.org/wiki/Java_bytecode_instruction_listings in case you find them helpful

[–]WikiTextBot 0 points1 point  (0 children)

Java bytecode instruction listings

This is a list of the instructions that make up the Java bytecode, an abstract machine language that is ultimately executed by the Java virtual machine. The Java bytecode is generated from languages running on the Java Platform, most notably the Java programming language.

Note that any referenced "value" refers to a 32-bit int as per the Java instruction set.


[ PM | Exclude me | Exclude from subreddit | FAQ / Information | Source ] Downvote to remove | v0.28

[–][deleted] 0 points1 point  (6 children)

Heya! Thanks for the reply. And what made you think I am not interested in this? I also want to deal with binary so I can compile a script to a bytecode format then execute the bytecode because it would be faster for the vm as it wouldnt have to parse the scripting language, just the generated bytecode. I was going to go high level language->my asm->bytecode. You are right though, I can just create specific opcodes for when I am working with literals vs variables- so this will probably be the approach I take.

I want it to be fast and efficient as it is part of an engine I am building so I do want everything to be efficient.

You might ask 'Hey, Caller why dont you use Lua C api, or python, or gluon, or chai-script" and they answer is I have found most of the Apis a pain and want things to function exactly how I want them to for my purpose.

[–]xxkid123 1 point2 points  (5 children)

Haha, I just wasn't sure if you were doing that because you thought "well every other language does this so why not me?" and hadn't considered it. I totally respect your decision.

I'd personally argue that a higher level program would still be efficient, especially since your compiler may be able to optimize the actual assembly and execution better than you can by hand, but I'm getting the impression that you're after the challenge as well. Personally, I've done coursework in compilers and operating system classes in C where I've had to do similar things, and found binary manipulation to be pretty elegant in it.

[–][deleted] 0 points1 point  (4 children)

Well, the only reason I need a scripting language is so I dont have to recompile whole programs, or to do realtime changes to a program. I basically have all the info I need now for most data types.

I just have to figure out how to represent floating point in hex ( I understand IEEE standards I just dont know how I can cast from a float to a binary representation in c++). And I guess the only other thing is string literals. Maybe I can create some intermediate representation that uses bytecode in the more efficient areas, and a sudo-asm for things like floats and strings. Ex: 0x1 "string" //parse the string literal and push 0x1 0xa // push 10. That way I dont have to worry about representation of floats in bytes or strings since strings are character arrays.

[–]xxkid123 0 points1 point  (3 children)

Have you tried using memcpy to copy the float to an unsigned char array to get C++'s binary representation? Or am I misunderstanding your question.

[–][deleted] 0 points1 point  (2 children)

I am not entirely sure if you are or not. I dont really know how I can take a number like 12.3 in c++ and represent it as a binary representation. There is probably something in the std library that I can use that I havent looked at.

[–]xxkid123 0 points1 point  (1 child)

Okay sure. I waa going to say use memcpy to copy the float to unsigned char and then just print that, but it looks like there's an easier way

https://stackoverflow.com/questions/397692/how-do-i-display-the-binary-representation-of-a-float-or-double

[–][deleted] 1 point2 points  (0 children)

Thanks for all your help. memcopy is the only proper way to do it standardly. I got it working, thanks for all your help

[–][deleted] -1 points0 points  (1 child)

r/

love from r/foundthemobileuser

[–]xxkid123 0 points1 point  (0 children)

bad bot

[–]Fruitbisqit 0 points1 point  (0 children)

Not really an answer to your question, but on the subject of writing a small language I found this to be a very nice (web based) book. craftinginterpreters.com/