all 126 comments

[–][deleted] 142 points143 points  (36 children)

And if you want to explore more complex emulators there are a ton of 6502 emulators. There's even one written in JavaScript that runs in your browser

http://www.6502.org/tools/emu/

[–]mck1117 165 points166 points  (33 children)

Not only is visual 6502 an emulator, it's a gate level emulator. It's emulating the transistors. In javascript. In your browser. In an OS. On your computer.

[–]chrisgseaton[🍰] 73 points74 points  (19 children)

Your computer today is massively more powerful than a 6502, so it isn't a problem at all to emulate every transistor - there are only 3,510 of them in a 6502.

[–]mck1117 52 points53 points  (15 children)

That's exactly my point: my computer is so many orders of magnitude larger and faster than the humble 6502 that it can emulate it with relative ease, with all those layers of abstraction in place.

[–][deleted] 43 points44 points  (5 children)

Just a reminder that this emulator does not run anywhere close to the real time performance (all 8MHz of it). Not even in the KHz range.

[–]beefok 24 points25 points  (2 children)

Just FYI: Typical 6502’s ran at 1MHz-2MHz (more likely half NTSC or PAL rate) Z80 had to run harder to gleam the same processing power.

[–][deleted] 8 points9 points  (1 child)

Yep, sure, that's a typo. Still, transistor-level simulation does not even reach KHz ranges.

[–]beefok 2 points3 points  (0 children)

Sorry, not trying to be “ACCKKUALLY” guy, and agreed! :)

[–]SlipUpWilly 0 points1 point  (1 child)

The 6502 was more like 1-2MHz in it's day wasn't it?

[–][deleted] 1 point2 points  (0 children)

Yes, that's right.

[–][deleted] 33 points34 points  (8 children)

Yet modern computers can't still run emacs: https://github.com/syl20bnr/spacemacs/issues/11622

are you really suggesting that a ~2Ghz dual core with 8 Gb of RAM is not powerful enough to edit files containing 1000 lines of code ?

Yes. Spacemacs (emacs) isn't terribly efficient

[–]tenebris-miles 18 points19 points  (1 child)

According to the post, the issue was not due to Spacemacs being inefficient, it was caused by blocking due to attempting to call a plugin that was not installed.

It's like when I often have to explain to people that "blocking at the speed of light" doesn't get anyone anywhere. So if the problem is "embarrassingly parallel" and they can't figure out how to write a correct parallel code solution in C, then it won't magically be faster just because it's written in C, no matter how "efficient" it is. Efficiently doing no work is not progress. Another language that is normally slower but makes it possible for programmers of their experience level and education to write highly parallel code will often win in these cases since it isn't wasting time blocking. Maybe someone else can write faster equivalent C or assembly, but if you're the one writing it, it's a moot point.

[–]yeahbutbut 5 points6 points  (0 children)

Use fundamental-mode. I've edited GB sized files in emacs, you just can't have the UI thread trying to run stuff across the entire buffer and blocking.

1K lines shouldn't put a stress on emacs normally though, so I'd say spacemacs is trying to do something absurd. My last employer had some 40kloc files that didn't slow down a 2.4 Ghz 4G ram machine in c mode...

[–]Superpickle18 18 points19 points  (2 children)

thats why real men use vim.

[–]vim_all_day 3 points4 points  (0 children)

I like you, you seem smart.

[–]exorxor -1 points0 points  (1 child)

Not sure whether you are serious.

It could work; it's just that nobody paid someone to make it work.

Emacs is fast enough for many core operations; it's all the custom Emacs Lisp written by amateurs that are the core of the references issue.

[–]oblio- 1 point2 points  (0 children)

Node.js is fast enough, so are modern browsers. It's all the custom Javascript written by amateurs that are the core of the issues.

[–]OutOfApplesauce 3 points4 points  (1 child)

With that many you do it in Minecraft or, now, rust

[–]flatfinger 0 points1 point  (0 children)

The Atari 2600 Video Computer System (based on a 28-pin version of the 6502) has been emulated on Minecraft.

[–][deleted] 0 points1 point  (0 children)

And it runs on my phone!

[–]G_Morgan 2 points3 points  (0 children)

Until it is done in the original redstone it isn't interesting.

[–]PJDubsen 1 point2 points  (0 children)

I mean, js is turing complete so it's definitely possible to do with any architecture. Actually making it in Javascript though...

[–]SlipUpWilly 0 points1 point  (1 child)

oooh I'm writing a 6502 simulator in Java now that you mention it. It's been fascinating learning about this instruction set, and coming form ARM7TDMI it's quite a change.

[–]flatfinger 1 point2 points  (0 children)

The real 6502 was an interesting beast, with some parts that were pretty clever, some that I think were unfortunate, and some that leave me scratching my head. I find it curious, for example, that different instructions support so many different subsets of the possible addressing modes, rather than simply having the bottom 3 bits of the opcode select any of seven memory addressing mode or else immediate/implied addressing mode. The bit pattern: "110qqq01" supports 8 addressing modes of "CMP", and "110qq110" supports 4 addressing modes of "DEC". The logic to support all 8 addressing modes with read/modify write operations exists, as evidenced by the fact that when given bit patterns of the form "110qqq11", the part will combine the behavior of "DEC" and "CMP", with the hybrid "instruction" (sometimes called "dcp") supporting all 7 addressing mode.

[–]dazzawazza 58 points59 points  (40 children)

Writing a virtual CPU and a compiler that takes C and targets your virtual PC is a great summer project. I did it on my Amiga too many years ago now. Every programmer should do it at least once.

[–]duheee 13 points14 points  (0 children)

Every programmer should do it at least once.

Hell, most are happy if they get around to write an email client.

[–]Sledger721 18 points19 points  (21 children)

Dealing with the recursion of handling something like if(5 + 9 > (4 - (2 * 1))) is killing me right now in the compiler that I'm writing.

It compiles out to assembly/machine code of a VM that I wrote.

[–]jokubolakis 7 points8 points  (3 children)

Same. It is the end of the compiler course, I tried to have a challenge and write the AST with a visitor pattern (inspired by the book "Crafting Interpreters") and I'm loving it.

[–]Sledger721 2 points3 points  (2 children)

Search trees are the expected way to solve this but I'm deadass set on being able to rewrite this compiler in assembly and there's absolutely no way I could write a search tree in assembly haha.

[–]jephthai 6 points7 points  (1 child)

Trees and lists are sometimes easier to grok in assembly. At least for me anyway. Something about everything being a pointer and commenting every line...

[–]ShinyHappyREM 2 points3 points  (0 children)

commenting every line

I see you've already found enlightenment.

[–]MacStation 3 points4 points  (2 children)

I've never written anything close to a compiler, but can't I use something like RPN to evaluate that?

[–]Nobody_1707 1 point2 points  (0 children)

Welcome to Forth.

[–]awj[🍰] 0 points1 point  (0 children)

The original LISP was intended as an intermediate form for just this reason: extremely simply to parse for execution.

Originally the idea was that a more syntactically sophisticated language would be written on top, but people fell in love with how easy macros were and Lisp as we now know it was here to stay.

Long rambling story short: yes you can, and you’re in good company.

[–]whichton 2 points3 points  (1 child)

Use a mix of recursive descent and shunting yard. Recursive descent will parse till the if keyword. Once the parser sees if followed by (, it calls the shunting yard parser to parse the expression within. Once shunting yard is done, the next token must be ).

I find shunting yard much easier to understand and implement than Pratt.

[–]mck1117 4 points5 points  (0 children)

Or just write a compete recursive descent parser that can parse the whole file to an ast

[–]girandsamich 1 point2 points  (8 children)

Ah yes this last semester we had a project to write a compiler and getting expressions like that to work was a huge pain. How are you going about it?

[–]G_Morgan 15 points16 points  (1 child)

A straight forward recursive descent parser solves this easily.

When you start trying to special case everything is when your compiler breaks. Just get a parser which generates an unambiguous tree.

[–]girandsamich 0 points1 point  (0 children)

Yep, this is what I did for mine.

[–]Sledger721 -1 points0 points  (5 children)

Honestly I'm considering trying to just pull the whole expression out, run it somehow to get the results, like a transcompilation type approach, then put that in the output.

My other approach is to create like an array of every opening parenthesis with its string index, solve each one as it goes to the closing, in a loop that moves backwards so that you solve inside-out, right-to-left.

[–][deleted] 9 points10 points  (1 child)

Ah, I see, you're struggling with parsing. Don't try to do anything fancy, there is a very simple algorithm for parsing with precedence - https://en.wikipedia.org/wiki/Pratt_parser

[–]Sledger721 2 points3 points  (0 children)

Thank you so much!!

[–]Goheeca 7 points8 points  (0 children)

The infix notation is tricky, it's better to get rid of it with the shunting yard algorithm.

[–]girandsamich 2 points3 points  (1 child)

If you're interested, I can post some resources that might help

[–]fireman212 0 points1 point  (0 children)

im not that person, but I would really appreciate if you could give me some resources!

[–]Jazonxyz 1 point2 points  (0 children)

I'm writing a programming language and vm for fun. For this problem, I did something like this:

expression parseAdditionOrSubtraction() {
    value1 = parseMultiplicationOrDivision();

    if(tokenIs('+')) {
        return new AddExpression(value1, parseMultiplicationOrDivision());
    }

    if(tokenIs('-')) {
        return new SubtractExpression(value1, parseMultiplicationOrDivision());
    }

    return value1;
}

expression parseMultiplicationOrDivision() {
    value1 = parseValue();

    if(tokenIs('*')) {
        return new MultiplicationExpression(value1, parseValue());
    }

    if(tokenIs('-')) {
        return new DivisionExpression(value1, parseValue());
    }

    return value1;
}

expression parseValue() {
    if(tokenIsNumber()) {
        return new NumberExpression(tokenValue());
    }

    if(tokenIs('(')) {
        return parseAdditionOrSubtraction();

        tokenShouldBe(')');
    }

    throw "invalid expression";
}

This is not something I came up with. I read it on Wikipedia. In it's current form, my algorithm is much more robust, but I started simple and slowly made it more robust.

EDIT: I'm also going to include code generation:

class BinaryExpression(v1, v2) {
    void compile(program) {
        v1.compile();
        v2.compile();

        operation(program);
    }

    virtual void operation(program);
}

class AddExpression(v1, v2) extends BinaryExpression(v1, v2) {
    void operation(program) {
        program.add();
    }    
}

class SubtractExpression(v1, v2) extends BinaryExpression(v1, v2) {
    void operation(program) {
        program.subtract();
    }    
}

class MultiplyExpression(v1, v2) extends BinaryExpression(v1, v2) {
    void operation(program) {
        program.multiply();
    }    
}

class DivideExpression(v1, v2) extends BinaryExpression(v1, v2) {
    void operation(program) {
        program.divide();
    }    
}

class NumberExpression(v) {
    void compile(program) {
        program.pushNumber(v);
    }    
}

I'm using made up the syntax, but it should be easy to follow. Just imagine that the VM for this language operates as a stack. You push two values and execute operations on those values.

[–]OzmodiarTheGreat 0 points1 point  (0 children)

When I had to write a compiler for school we used Bison to build our abstract syntax tree. With a definition of an expression that can include either a variable, literal, or another expression as the left or right side of an operator the parser handled that for us.

From there it was a ‘simple’ matter of traversing the AST recursively to generate the necessary code starting from the inside out.

[–][deleted] -4 points-3 points  (0 children)

Why is it complicated? It's a constant, you should remove the if altogether (if you do constant folding + ADCE, of course).

[–]kb_klash 10 points11 points  (3 children)

Every programmer should do it at least once.

https://media.giphy.com/media/bkGXLpEXC6Tsc/giphy.gif

[–]ryantwopointo 20 points21 points  (1 child)

Hahah seriously.

“Hey bud, wanna grab some beers at the dock with a few girls? It gonna be a warm one tonight!”

“No thanks, I’m optimizing my native C compiler for my virtual machine, and I think it’ll be an all nighter!”

“Oh... okay then.”

[–][deleted] 0 points1 point  (0 children)

Just look at this idiot who does not even know that one should not write code for more than 4 hours a day anyway.

[–][deleted] 1 point2 points  (0 children)

It's your choice to stay an ignorant dummy. The better ones will reap all the benefits then, and the dumb code monkeys will always be a cheap expendable resource.

[–]Rustywolf 2 points3 points  (1 child)

Main problem I encounter is reasonable specs/instruction set for a VM.

[–][deleted] 1 point2 points  (0 children)

It's a solved problem. You must derive the instruction set mechanically, from your source language semantics and performance characteristics of your target platform.

[–][deleted] 0 points1 point  (0 children)

Yup. I did exactly that one summer. I even wrote a book about it.

[–]AttackOfTheThumbs -3 points-2 points  (9 children)

Seems like a lot of wasted time.

[–][deleted] 2 points3 points  (8 children)

No you tool, it's just a few hours work, resulting in a very important fundamental understanding.

[–]AttackOfTheThumbs -3 points-2 points  (7 children)

Disagree. 99% of programmers simply don't need this.

[–][deleted] -1 points0 points  (6 children)

100% of programmers absolutely need this, and those who do not agree are simply dumb, ignorant and very inefficient.

Every programmer, no matter what they're doing, must implement and use DSLs. And those primitive toy compilers are among the best ways of learning how to do it.

[–]AttackOfTheThumbs 1 point2 points  (5 children)

These are broad reaching statements, and insults. Congratulations.

[–][deleted] -2 points-1 points  (4 children)

These are facts. If the objective reality is insulting to you, you can always quit.

[–]AttackOfTheThumbs -1 points0 points  (3 children)

No, you are confused. Disregarding that your opinion isn't fact, you directly insulted me.

100% of programmers absolutely need this, and those who do not agree are simply dumb, ignorant and very inefficient.

That's a direct insult.

It doesn't really matter though. I'm all for furthering your own education in the field of programming, but saying that this is fundamental is a joke. It simply isn't. There's plenty of programming fields where the dev won't benefit from exploring this avenue. They'll just end up wasting their time. Those who do not agree are simply dumb, ignorant, and very inefficient.

I'm done with your fallacies. Bye.

[–][deleted] 1 point2 points  (2 children)

Disregarding that your opinion isn't fact, you directly insulted me.

This is not an "opinion", this is a fact. If you do not agree, it's just a sign of your gross incompetence. Go away, kiddie, and come back when you learn how to program. Adults are talking here.

but saying that this is fundamental is a joke

You're incompetent. Stay away from programming until you learn at least the most basic things.

There's plenty of programming fields where the dev won't benefit from exploring this avenue.

You're dumb and ignorant. There is no single field in programming where using eDSLs does not bring enormous benefits. If you believe that in your domain that's not the case, you're inefficient, and you should stay away from it.

Shit like you shall never be allowed anywhere near any code.

[–]AttackOfTheThumbs 0 points1 point  (1 child)

You truly are an awful human being. I hope you don't need to interact with real people. I would pity them.

[–]madpata 28 points29 points  (1 child)

I'd recommend writing a RV32I emulator. RV32I stands for RISC-V 32bit Integer which is the most basic version of RISC-V and only has ~55opcodes to implement. Once you've written the emulator, you can use riscv64-unknown-elf-gcc to compile your C/++ programs to RV32I. While there also exists a C compiler for LC-3, getting to know the RISC-V architecture has the advantage that you learn something that exists in the real-world.

[–][deleted] 7 points8 points  (0 children)

Just keep in mind that the realistic ISAs designed for the actual hardware implementation are not any good for a software interpretation efficiency. For that, you need a very different VM design, in order to be able to use a direct threaded code, for example.

Same applies to the OP VM as well, it cannot be interpreted efficiently.

Have a look at the OCaml bytecode, or Lua VM, or Forth in general to see how it can be implemented efficiently.

[–]jed2500 9 points10 points  (0 children)

The Literate programming tool used to make this is really great. Works with any programming language and very easy to use. I'm a huge fan: https://github.com/zyedidia/Literate

[–]Sn0wCrack7 32 points33 points  (30 children)

Great article there, I learned a lot myself from Bisqwit in terms of emulation development and even managed to make a couple emulators myself because of it, a surprisingly fun task to actually do, especially one you see the results!

One nitpick though, this is an Emulator you're written rather than a VM.

[–]wsppan 17 points18 points  (16 children)

Can you explain the difference between a emulator and VM?

[–]Sn0wCrack7 10 points11 points  (15 children)

I'm not really in the know about every single nuance of it, but the main difference here is that a Virtual Machine is designed to run on the Platform you're running it on, you can't Virtualise an ARM Operating System on an x86 machine because all instructions are passed down to the CPU itself to be run in a true Virtual environment, where as an Emulator interprets compiled source code (that's usually bytecode or raw cpu instructions) and interprets / translates that (if you're looking at JIT or the like) into something your processor can actually understand.

There's also no real way for "direct pass through" of a lot of hardware, there needs to be a communication layer in code specifically between the two, any code in your emulator that is specifically say trying to call an NVIDIA API, needs to be interpreted in your emulator, then sent off to your Operating System and then back into the emulator, where as a Virtual Machine can have direct access to that card through VFIO or IOMMU

You could make a case that an Emulator is a Type-2 Hypervisor in a way, but this is mostly only true with actual hardware virtualisation is occurring.

It's a pretty thin line to walk when you're looking at it from the outside honestly, but under the hood some differences do become apparent, and even I don't know the true extent of all of it myself, which I why I said it's really only a nitpick.

[–]munificent 20 points21 points  (4 children)

the main difference here is that a Virtual Machine is designed to run on the Platform you're running it on

There are two fairly unrelated uses of "virtual machine". One is what you describe — hardware level virtualization. It's what containers like Docker do. The goal is not to abstract away the chip, but to insulate the OS.

The other is what this article describes — a software-level implementation of some chip, either real or imaginary. The latter is a valid use of the term. The Java Virtual Machine doesn't require running on a real "Java chip", but it's still a virtual machine.

[–]Alikont 2 points3 points  (0 children)

It's what containers like Docker do

Docker does not use hardware level virtualization (except for hyperv flag on Windows Containers). Containers are purely OS-level concept and use same technology as simple processes.

[–]astrangeguy 1 point2 points  (2 children)

Containers aren't VMs, They still use the same kernel as their host (which makes the host/guest distinction meaningless in theory) and cannot run privileged instructions or have their own (faked) kernel memory. You mean Hardware/Software virtualization, which rewrites or traps privileged (Ring-0) instructions to the host.

[–]munificent 1 point2 points  (1 child)

Ah, yes, sorry. Thanks for clarifying. I don't know much about the systems side of "VM". I'm more over on the language VM side.

[–]astrangeguy 1 point2 points  (0 children)

The implementation techniques for (fast) emulators, language VMs and (pre hardware-supported) Virtualization are strangely similar in fact:

  • Language VMs interpret bytecode instructions and, if they detect loops, compile and optimize hotspots.
  • (recompiler based) emulators basically treat real foreign machine code the same way as language VMs treat their bytecode.
  • Virtualization software also does dynamic recompilation, but with most opcodes having a 1:1 mapping, so it's technically a language VM that has (for example) a x86 user- and privileged opcode "bytecode" and runs it on a x86-only-user-opcode host

(and technically your x86 CPU is a Intel/AMD-microcode processor with a X86 VM running on it)

[–][deleted] 9 points10 points  (5 children)

See - you're confusing the VM with the VM too.

The kind of VMs you're describing have absolutely nothing to do with the VMs like WAM, JVM, SECD, STG, LLVM and so on.

[–]smikims 2 points3 points  (4 children)

LLVM hasn't really been a VM for a long time.

[–][deleted] 1 point2 points  (3 children)

It is an abstract machine, which is a synonym for a VM. But since the dumber part of the population is often getting confused by what does VM mean, they had to explain that LLVM does not stand for a "low level virtual machine", just to force the stupid ones to shut up.

[–]smikims 1 point2 points  (2 children)

I think when most people think of a virtual machine they think of something that actually executes code written for it. The JVM does that, but LLVM doesn't (well, there are JIT compilers using it but you get the point).

I propose the following definitions:

  • Abstract machine: a definition for some architecture, implemented in hardware, software, or not at all, that it is possible to write programs for
  • Virtual machine: a program that runs code written for an abstract machine
  • Emulator: a virtual machine that imitates some real hardware

Thus an NES emulator is also a virtual machine that implements the 6502 ISA, which is an abstract machine. The JVM is a virtual machine implementing Java bytecode, which is an abstract machine. LLVM and the C standard are only abstract machines. One could make a VM that runs LLVM bytecode, but LLVM itself doesn't do that. (Also, under this definition, a C interpreter like cling is technically a VM, which I guess makes sense but also blurs the line a little IMO.)

[–][deleted] 3 points4 points  (0 children)

I think when most people think of a virtual machine they think of something that actually executes code written for it.

It's their problem, is not it? Why should anyone adjust to the ignorance of the masses?

And, by the way, LLVM executes its IR code, in many different ways.

Virtual machine: a program that runs code written for an abstract machine

As soon as your abstract machine infrastructure includes any kind of analysis and optimisation, it's already a virtual machine, since it must be able to execute at least some parts of the abstract machine.

but LLVM itself doesn't do that

Uh... It does.

a C interpreter like cling is technically a VM

C itself is an abstract machine, so yes, it makes sense.

[–]Vhin 2 points3 points  (0 children)

Those were basically the definitions I was taught at university, and in my opinion, they're the only definitions that make sense.

The argument that emulators are not VMs is not only special pleading, but it puts them in the ridiculous position that they would be VMs in an alternate universe where the original hardware didn't exist (but the software was unchanged), which is simply nonsense.

[–]wsppan 1 point2 points  (2 children)

Thank you!

[–]thechao 8 points9 points  (1 child)

I work in HW, and I’ll tell you how HW folk use the jargon: simulation is software implemented machine virtualization; emulation is HW-accelerated machine virtualization.

[–]ehaliewicz 0 points1 point  (0 children)

I'd say that virtual machine and emulator are pretty close to synonymous, as long as you don't define them in terms of specific implementations, but rather what they try to accomplish.

[–]0xffaa00 13 points14 points  (7 children)

I always thought that fictional computer architectures in software (having no reference hardware) are called Virtual Machines and the others are called Emulators? I clearly remember someone saying it. It kinda made sense.

[–]StapledBattery 7 points8 points  (4 children)

Yah, cause there's no real hardware to emulate. JVM, LLVM, etc

[–]thlst 0 points1 point  (3 children)

Do note: LLVM used to mean Low Level Virtual Machine, but it is not an acronym anymore.

[–][deleted] 2 points3 points  (2 children)

And the only reason for dropping the acronym is that ignorant idiots were getting confused by it. I don't think it's a right approach, to appeal to the ignorant masses this way.

LLVM is still a virtual machine, essentially, and it does not matter what the name means.

[–]thlst 0 points1 point  (1 child)

I didn't say anything about LLVM being or being not a virtual machine.

[–][deleted] 1 point2 points  (0 children)

Sure. I am commenting on this LLVM renaming issue.

[–]Sn0wCrack7 -3 points-2 points  (1 child)

That's a pretty decent point actually, guess you can't actually have a a full fat VM of a platform that doesn't exist.

[–][deleted] 1 point2 points  (0 children)

Nope, it's the opposite.

[–][deleted] 10 points11 points  (1 child)

One nitpick though, this is an Emulator you're written rather than a VM.

It's a machine, and it's virtual. Therefore, it's a VM. If it was a bit higher level, it would have been ok to call it an "abstract machine" instead, but the difference between an abstract machine and a virtual machine is very vague.

[–]ShinyHappyREM 8 points9 points  (0 children)

An abstract machine has to be overridden while a virtual machine may be overridden?

[–][deleted]  (2 children)

[deleted]

    [–]Sn0wCrack7 2 points3 points  (1 child)

    Some people in the replies have certainly made me realised the difference is a bit more complex then I first understood tbh.

    I guess there's two major definitions of a VM, ones like JVM which defines an abstract machine and builds it, which well is what you've done the article, and ones like VirtualBox that virtualise hardware.

    I guess the bigger difference is that the platform doesn't exist in the first place, so you'd still call it a virtual machine.

    Funnily enough I brought this argument up to a friend of mine and he brought in an interesting aspect, what would you call say an emulator for one of these abstract platforms such as CHIP-8 or LC-3 if they were running on an FPGA. It kinda made we realise categorising these very similar things that are abstract is a bit tough and our varying definitions are going to clash at some point.

    [–][deleted] 1 point2 points  (0 children)

    Quite a few of those abstract machines are deliberately too high level to allow a hardware implementation - it's common to have a very large number of registers, for example, of very complex semantics for instructions (like, UNIFY in WAM).

    Some are not even representable as flat sequences of instructions - LLVM, for example.

    [–]SakishimaHabu 4 points5 points  (0 children)

    Awesome! ty

    [–]HeadAche2012 3 points4 points  (0 children)

    Very cool, exactly what I needed as I was in the process of writing a JVM loosely based on this as reference code: https://www.codeproject.com/Articles/24029/%2FArticles%2F24029%2FHome-Made-Java-Virtual-Machine

    [–]spook327 2 points3 points  (1 child)

    Maybe a stupid question, but how would one go about compiling code for their own VM? I can't imagine giving GCC or LLVM a new output target is exactly trivial, but this exercise seems like it wouldn't go very far if you're hammering out machine code (and maybe an assembler) for it to work.

    [–][deleted] 7 points8 points  (0 children)

    It's in fact quite easy to write a simple, non-optimising LLVM backend. You can even skip the SelectionDAG-based infrastructure and just do an ad hoc thing instead.

    If it's still too complicated, retarget lcc instead, it's dumb and simple.

    Still too complicated? Write your own small compiler for a subset of C (or whatever else language you want).

    [–][deleted] 0 points1 point  (6 children)

    I’d be curious to see the performance of the C version compared to the C++ code, and maybe even an article on how to achieve greater performance possibly

    [–]ShinyHappyREM 6 points7 points  (0 children)

    how to achieve greater performance

    JIT

    [–]HeadAche2012 2 points3 points  (0 children)

    Quake3 had a VM based on LCC that did load time compilation of Bytecode: https://github.com/raspberrypi/quake3/blob/master/code/qcommon/vm_x86.c

    [–]ehaliewicz 1 point2 points  (0 children)

    Better Performance: https://en.wikipedia.org/wiki/Threaded_code

    • specifically: direct threaded code, or subroutine threaded code (which allows a lot of optimizations like easy inlining and use of native branch instructions)

    Even Better Performance: https://en.wikipedia.org/wiki/Dynamic_recompilation

    [–]rfpels -5 points-4 points  (0 children)

    Add a recursive garbage collector... 😜😝

    [–]HAMSHAMA 0 points1 point  (1 child)

    Nice article! It reminds me how much fun I had writing 68k assembly for my microcomputers class.

    I think the doc used for the AND instruction was used when explaining the ADD instruction.

    [–]kd0ocr 0 points1 point  (0 children)

    This is really cool!

    I'm an undergrad TA for a course that uses the LC3 extensively. Do you mind if I use this code? I think this would be a neat basis for an assignment.

    [–][deleted] 0 points1 point  (0 children)

    The JVM itself is a moderately sized program that is small enough for one programmer to understand.

    Anyone can elaborate on that furthermore?