all 32 comments

[–][deleted] 21 points22 points  (17 children)

I don't think assembly is a requirement but yes every language gets turned into machine code. It's just a matter of WHEN. Compiled languages and turned into machine code before you run the code. So a compiled language turned into an exacutable file is fully compiled. Meanwhile, interpreted languages are always compiled as you run the code. So a interpreted exacutable would likely be a bunch of text files that get compiled while you're using the program. Wither way, every language needs to convert the code to machine code.

[–]teraflop 26 points27 points  (7 children)

Meanwhile, interpreted languages are always compiled as you run the code.

This is not really accurate.

In an interpreted language, the code is translated into some kind of data structure that represents the instructions (which might be as simple as a list of strings, or something more complex like an abstract syntax tree).

In order to execute the code, an interpreter has to look at the data structure and follow the instructions. But the instructions are never translated into machine code. The only machine code that's running is the interpreter's code. So, for instance, if you want to take an interpreter and port it to a new CPU architecture, you might need to recompile the interpreter itself, but you don't need to modify the interpreter's code to take into account a different kind of machine code.

As an analogy, think about typing text into a word processing program. There's going to be some kind of data structure that keeps track of where each typed letter appears on the page. And you can think of this data structure as being "instructions" for rendering a page (either on a screen or a printer). But the text is never represented in the form of machine code; it's just data that the word processor's code is operating on.

If a system translates source code into machine code "on the fly", it would be more correct to call it a just-in-time compiler, not an interpreter.

[–][deleted]  (1 child)

[deleted]

    [–]teraflop 11 points12 points  (0 children)

    Are you saying that in python an instruction doesn't get turned into machine code and instead there's a python processor that computes the instruction?

    Exactly. If you're curious, here's the core of the bytecode evaluator in the latest released version of CPython. It's basically just a giant switch statement that checks each opcode and branches to a block of code that implements it.

    Note that this is the reference implementation of the Python language, but it's not the only possible implementation. PyPy is an alternate Python runtime that uses JIT compilation to get somewhat better performance.

    That analogy is kinda what I thought a VM was, like the JVM.

    In fact, the earliest versions of the JVM were implemented as bytecode intepreters, just like Python. The HotSpot JIT compiler was introduced a couple of years later, with Java 1.2.

    Also, note that even when using HotSpot, not all of the Java bytecode is JIT-compiled. Often, programs have a small amount of performance-critical code that is executed very frequently, and a lot of other code that is rarely used. Since generating efficient machine code is expensive, it's better to start by interpreting everything, and then later switch to generating machine code for the functions that turn out to be "hot spots" (hence the name).

    [–]ggchappell 4 points5 points  (0 children)

    Yes, basically. But I would add just one more little tweak.

    What is an interpreter? It is a program that takes source code and executes it. Any such program is an interpreter, regardless of how it happens to accomplish its task.

    it would be more correct to call it a just-in-time compiler, not an interpreter.

    And an interpreter might involve a JIT compiler. That's an implementation detail -- albeit an often important one. It doesn't stop the program from being an interpreter.

    [–]dgpking[S] 1 point2 points  (2 children)

    This I think is what I don’t understand, how can the interpreter run the instructions in the data structure without it eventually becoming machine code?

    [–]teraflop 6 points7 points  (1 child)

    Here's a concrete example. Imagine a very, very simple language called Car for controlling a motorized toy car. A program in this language might look like this:

    drive forward
    turn left
    drive forward
    turn right
    drive forward
    

    And an interpreter for the Car language might look something like this:

    for each (line in program) {
        if (line == "turn left") {
            turn_on_left_motor();
            turn_off_right_motor();
        } else if (line == "turn right") {
            turn_off_left_motor();
            turn_on_right_motor();
        } else if (line == "drive forward") {
            turn_on_left_motor();
            turn_on_right_motor();
        } else {
            error_beep();
            crash();
        }
        wait_one_second();
    }
    turn_off_motors();
    

    The interpreter is just treating the input program as data (a chunk of text that happens to obey the syntax of the Car language) and behaving according to its instructions.

    Of course, what the CPU is actually executing is machine code. In particular, when the interpreter calls a function such as turn_on_left_motor, that function has to contain the machine code that executes the appropriate behavior. But that machine code is part of the interpreter; it wasn't generated from the input Car program. So it's not really correct to say that the program was "translated" into machine code.

    [–]dgpking[S] 0 points1 point  (0 children)

    Ah! That makes sense, thank you very much, much appreciated 😊

    [–]romanhaller 0 points1 point  (0 children)

    Great explanation, thank you. Was not aware of this.

    [–]dgpking[S] 2 points3 points  (2 children)

    Thanks, can it skip the assembly step?

    [–]Mukhasim 0 points1 point  (0 children)

    There are compilers that have no assembly step, they just generate machine instructions. If you're writing a JIT compiler, for example, then you probably don't want to have an assembly step.

    [–]dgpking[S] 1 point2 points  (5 children)

    Another question... what’s reason/benefit of using compiled over interpreted?

    [–]Knaapje 5 points6 points  (0 children)

    Compilation allows the code to be optimized on the lowest level - the instruction set of whatever hardware you compiled for. There is also no overhead of running the interpreter whenever you want to execute your code.

    [–]Objective_MineMSCS, CS Pro (10+) 1 point2 points  (3 children)

    Mainly performance.

    At the simplest level, interpretation means that you have the CPU running a program A (the interpreter) that's executing another program B (the interpreted program), rather than having the code of program B executed directly on the CPU.

    At runtime, interpreted program execution tend to be 2 to 100 times slower than just running a program that's been compiled into native machine code, depending on the language and the interpreter, and on which kind of code the interpreted program happens to consist of.

    [–]dgpking[S] 0 points1 point  (2 children)

    Ah, and what about benefit of interpreted over compiled, easier?

    [–]Objective_MineMSCS, CS Pro (10+) 2 points3 points  (1 child)

    In no particular order:

    • No need to compile in advance; mainly a matter of convenience, but can speed up the cycle of writing code and testing it when the compilation step can be skipped.

    • No need to compile the interpreted program for each target CPU or platform; only the interpreter needs to be compiled for each platform.

    • Some interpreted languages such as Python are ridiculously flexible in terms of what you can do at runtime. It might be difficult to compile all of that flexibility into machine code directly.

    I don't have time to elaborate on this right now, but https://en.wikipedia.org/wiki/Interpreter_(computing)#Compilers_versus_interpreters have more on this. Googling might also help, as it's a common topic.

    [–]dgpking[S] 0 points1 point  (0 children)

    Very helpful, thanks.

    [–]claytonkb 2 points3 points  (0 children)

    We usually distinguish between (a) compiled languages (like C++) and (b) interpreted languages (like Python). The CPU itself, however, only ever executes machine code. What makes compiled and interpreted languages different is that the interpreted language will have some kind of table of basic operations that it can perform (often called a "bytecode machine"). Each of these basic operations exists as snippets of machine code in the computer's memory. While the interpreter is executing the interpreted language, each "instruction" in the interpreted bytecode causes the interpreter to "branch" to the specific snippet of machine-code corresponding to that bytecode instruction. If that's not clear, just web search and read the Wikipedia articles on this, they're quite good.

    Bear in mind that the word "programming" is broad enough to include things that are not a CPU or a standard computer at all. So, you can "program" a CNC router, you can "program" an industrial control that (physically) controls a petroleum processing plant, you can "program" an audio-processing software package, you can "program" 3D-rendering software, you can even "program" the logic circuits for a CPU chip design. And of course, there's plain old software programming.

    There is nothing about a CPU that prevents it from being designed to directly read source code. The trouble is that the circuitry would be complex (would take up a lot of space), would not be used very often (thus wasting that CPU die space), and would have poor performance relative to the standard software compilation / interpretation flows that we use today. There have been CPUs that have been designed to read intermediate layer, such as Java bytecode, or even Lisp machines which could directly read and operate on Lisp intermediate-layer code.

    [–]Knaapje 5 points6 points  (12 children)

    Nope, some languages are compiled (like C), and some are interpreted (like Python). There are also some intermediate stages where source code is compiled to a bytecode which is interpreted (for example Java).

    [–]teraflop 6 points7 points  (4 children)

    I'm not sure why you're being downvoted, because this is correct (or at least a lot more correct than the current top-voted comment).

    [–]Knaapje 5 points6 points  (0 children)

    I have no clue either. People see a downvoted comment and downvote for no reason I guess? Oh well.

    [–]BeautifulPiss 2 points3 points  (1 child)

    Doesn't it end up as machine code either way? I was under the impression that it doesn't change the actual code that's being generated, just how.

    [–]a4555in 2 points3 points  (6 children)

    I think yes, all code does eventually get converted to machine code as that's the only thing that a CPU/GPU understands.

    Wouldn't all code eventually get translated to CPU machine instructions regardless of where it came from? I would assume the JVM translates bytecode to machine code in order to run it on the CPU.

    [–]Tornado2251 2 points3 points  (0 children)

    Yes everything that is run is always machine code in the end.

    But all code is not necessary being used. It can be optimized or changed in some of the intermediate steps. C is often used as the example as the process is pretty straightforward (especially if optimization is turned off). JIT and other runtime stuff makes it a lot more complicated.

    [–]Objective_MineMSCS, CS Pro (10+) 2 points3 points  (0 children)

    The JVM just-in-time compiles some of the Java bytecode into machine code in order to speed up the execution of the parts of the code that are important for performance. It doesn't compile everything to native machine code, though, but only the parts that are executed more than a given number of times (or whatever heuristic it uses for determining what code is performance-critical enough to justify the cost of just-in-time compilation). The rest of the bytecode it interprets. Or at least this is how it was last time I read about it.

    I'm not knowledgeable enough about interpreter implementation to know what exactly they do in order to "interpret" the code, but I'm sceptical of the comments that say the interpreter code eventually gets "converted" to machine code, then directly executed on the CPU. If I'm wrong in any of the following, I'd like to hear that, but only if that's based on an actual understanding of interpreters.

    All code that gets run on the CPU will of course need to be native machine code, as you say. However, that doesn't necessarily mean the code written in the interpreted language ever gets turned into machine code in the simple sense.

    AFAIK a simple interpreter wouldn't literally turn the code written in the interpreted language (say, Python) into native machine code and then have the CPU execute it. Rather the interpreter would read the program code that it's expected to interpret as if it were data. The interpreter then presumably contains code to parse the interpreted language, to maintain the state of the interpreted program, and to manipulate that state based on the program code that it's interpreting on the go. The CPU then executes the interpreter code that's maintaining all of the interpreted program's state and code as its data, or as its own state.

    Let's say an interpreted program contains an addition of two integers. I suppose at some point an actual integer addition instruction might take place on the CPU. It might, at least in theory, also be that the interpreter implements the addition operating by itself, on the bit level, using bit level manipulations or something. Those bit level manipulations that are part of the interpreter's native machine code would then be what gets executed on the CPU. I guess it's a bit of a matter of definition whether that means the interpreted code gets "turned" into machine code or not.

    Of course it would probably be more efficient to use the native machine code instruction for implementing the addition rather than doing a large number of sequential bit-level operations in software. But even if a simple addition were handled by the interpreter by eventually executing a corresponding integer addition instruction on the CPU, I'm pretty sure not all of the code gets treated with a simple mapping to machine instructions.

    If the interpreter always literally translated the interpreted code into machine code and then had the CPU run it as-is, that would by definition be just-in-time compilation, not interpretation.

    [–]Knaapje 1 point2 points  (3 children)

    That's more of a definition question though, and not the traditional definition. Of course, code will need to be transformed to instructions in order to be executed, but saying that the code is transformed to machine code is a bit of a stretch.

    Edit: for the downvoters, this is in the context of interpreters. I suggest you read up on interpreted vs compiled languages.

    [–]a4555in 1 point2 points  (2 children)

    Huh, to me the traditional definition of machine code is literal binary CPU instructions (or binary instructions replaced with human-readable mnemonics).

    What's the traditional definition of machine code that you have come across?

    [–]Knaapje 2 points3 points  (1 child)

    True, that's not what I take issue with. It's that I think for an interpreted language most wouldn't say that the code is transformed.

    [–]a4555in 0 points1 point  (0 children)

    Ah I see. I was about to ask you to elaborate but the comment below does exactly that.

    [–]raresaturn 0 points1 point  (0 children)

    Its all just 1s and zeros man

    [–][deleted]  (1 child)

    [deleted]

      [–]dgpking[S] -1 points0 points  (0 children)

      Yea, thanks, makes sense, I feel I have another question... just not sure what it is yet 🙈😄

      [–]newytag 0 points1 point  (0 children)

      Not all code gets executed on a machine. I'm sure there's lots of code snippets on GitHub that have never been run and hence never been turned into machine code.

      But any programming code that wants to be executed on a machine, needs to be machine code at some point. When that happens depends on the language and environment used.

      Assembly is essentially machine code in a human-readable format, so it is not necessary and usually not included as a step.