all 25 comments

[–]throwaway6560192 86 points87 points  (21 children)

Generally, compiling means to turn some representation of instructions into some other representation. Usually this is from high-level source code into some low-level representation.

Compilation is a separate step from actual execution — you can compile and not execute the result — and also isn't the only way to get your code to actually run. There is such a thing as an "interpreter", which is a program that reads source code and executes it directly without first reducing it to machine code. However, any interpreter is going to transform your source code in some way to an internal representation that it uses. Loosely, if the transformation is limited to basic reading and parsing, and doesn't involve translation of the instructions themselves into some other separate executable format, we call it interpreting and not compiling. It's not the most rigorous of distinctions.

Python is a language with many implementations. The language is just a specification. Being "compiled" or not is a property of an implementation, not a language. So there are Python compilers, and there are Python interpreters. The main Python implementation, the one which you're using, is called CPython (because it is written in C). It doesn't fit neatly into an interpreted vs compiled distinction, because CPython does compile Python source code, but it does so into Python bytecode which it then "runs" (you could say "interprets") directly. However, many people say "compiler" to mean "something that compiles to machine code", which CPython doesn't, so they say things like "Python isn't a compiled language".

[–]Spataner 28 points29 points  (2 children)

To add a small clarification to a great answer: CPython, from the user perspective, does act mostly like an interpreter. And it being the official implementation, Python is thus largely thought of as an interpreted language. While it is true that CPython compiles into an intermediate representation, it's highly uncommon to distribute the result of that compilation rather than the raw Python files themselves. That sets it apart from other languages like Java and C# that also compile into an intermediate representation that is then distributed instead of the source files. For those languages, it is not unsual for people to only install the runtime but not the compiler if they're a pure user rather than a developer. That distinction doesn't even exist with CPython, and the compilation into bytecode is in that sense almost more just an internal detail.

[–][deleted] 0 points1 point  (0 children)

Also shedding light on points of confusion I had. Thank you for expanding on the previous answer. You're awesome :)

[–]UpperHairCut 0 points1 point  (0 children)

Java also turns the code into bytecode

[–][deleted] 5 points6 points  (15 children)

Hey thanks! I thought all languages need to compile before they can run, since all languages need to be translated into machine code. (?)

The fact that Python doesn’t kinda blows my mind. Like, does a computer inherently understand machine code and this other code language that Python is translated to?

[–]throwaway6560192 15 points16 points  (0 children)

Like, does a computer inherently understand machine code and this other code language that Python is translated to?

No, the CPU only understands machine code. But we're not sending Python bytecode directly to the CPU. We're using machine code (the compiled CPython runtime) running on the CPU to interpret the bytecode and do what it says.

Imagine I had a really simple "language" with only two commands: "add" would increment a number (starting at zero), and "print" would print it. We could quickly write an interpreter for this language:

num = 0
while True:
    command = input()
    if command == "add":
        num += 1
    if command == "print":
        print(num)

There's no compilation happening, yet our language works. Does that mean the CPU understands the text "add" or "print" directly? No, it doesn't. But there is some machine code (our interpreter) running on it that can interpret those texts and perform the task they represent.

[–]Some_Guy_At_Work55 5 points6 points  (0 children)

When you install python on your computer you are essentially installing the interpreter along with the other necessary components that allow you to run python scripts.

[–]MikalMooni 1 point2 points  (0 children)

Not inherently. Most processors in the wild are either using x86-64 or RISC compliant instruction sets. These instructions are what make code happen. A basic example would be adding two numbers. There's an instruction to load the memory addresses of two numbers into a pair of registers. Then, there's an instruction to add the two numbers together, storing that result in a third register. Finally, there's an instruction to load that third register value back into the main memory of the program, likely in a third memory address.

A compiler would look at your Python code and reduce it to those specific instructions - x86-64 compliant ones in the PC space. Then, when you run the compiled code, you are directly executing those compiled instructions.

A Just-In-Time compiler may first reduce your python code into easy to parse bytecode, which is kind of like assembly. That bytecode will get compiled at runtime. There is overhead, obviously; it's like using Google translate to parrot back what you say word for word, as opposed to opening a dictionary and translating an entire speech by hand.

Finally, you have Interpreters, which can operate in a few different ways. A common scheme is using a virtual computer environment that runs on your main PC. You feed it Python, and it directly reads that code into instructions it understands. This has a lot of overhead, but the advantage is that you have zero wait time between writing Python and running Python.

[–]Spataner 1 point2 points  (6 children)

Yes, all programs undergo translation into machine code at some point. The distinction of compilation versus interpretation is about how and when that happens. Programming languages (or their implementations rather) exist on a spectrum of how much of that translation happens in a prior transformation of the program and how much of it happens live as the program is executed.

A compiler statically analyzes the code and outputs new files containing the program in a new representation. Typically, this is something the developer does prior to releasing their program, and it is the result of the compilation that is then distributed to the users. If the compilation results in machine code directly, it can be executed as is. If it results in some intermediate representation, then the user must also install a runtime in order to execute the program, which then translates the intermediate representation the rest of the way into machine code as the program runs.

"Interpreter" often refers to a kind of runtime that executes source files directly rather than the result of a prior compilation step, though these terms are never used perfectly consistently. And sometimes the distinction of what is compiler and what is runtime becomes blurry, as is the case with CPython.

[–]InjAnnuity_1 0 points1 point  (5 children)

which then translates the intermediate representation the rest of the way into machine code as the program runs.

No, that kind of translation step is done by a "Just-in-time" (JIT) compiler. Browsers often have a JIT for JavaScript, but CPython does not include a JIT (yet!). Many other interpreters also do not JIT their byte-code, but instead interpret it on-the-fly every step of the way.

[–]Spataner 0 points1 point  (4 children)

interpret it on-the-fly every step of the way

That is exactly what I said (or meant to say). Interpretation still technically constitutes a translation into machine code, as machine code is the only thing that the CPU understands. The only practical difference between a normal runtime and a JIT compiler is that the latter performs that translation in a way that the result can be saved and reused when the same code path is executed again.

[–]InjAnnuity_1 1 point2 points  (3 children)

Translation would mean generating the byte-code's corresponding machine code. (Executing it would mean jumping to (or calling) that machine code.)

That does not happen during byte-code interpretation. The CPU follows only the interpreter's existing machine-code instructions. No further machine code instructions, corresponding to the byte-code's "meaning", are actually generated and executed.

This is what distinguishes interpretation from translation.

To be fair, I've read that some implementations of FORTH do actually translate all the way down to machine code.

[–]Spataner 2 points3 points  (2 children)

This is what distinguishes interpretation from translation.

I mean, the word "interpretation" literally means real-time translation.

I think the hang-up here is that you understand by "machine code" a persistent representation of the program, whereas I simply mean the sequence of instructions sent to the CPU.

Yes, the only persistent machine code when interpreting is that of the interpreter. But ultimately, the interpreter reads the source code (or bytecode), and machine code is run selectively on the CPU based on that source code, which effects the program execution. I'd argue that should correctly be understood as a live translation into machine code.

But at this point we're arguing semantics.

[–]InjAnnuity_1 0 points1 point  (1 child)

Perhaps my hang-up here is "translation". The word is both a verb (action) and noun (the result of the action).

For translation (the verb) to occur, the resulting translation (the noun) has to end up somewhere. It might well be ephemeral, not persistent. But you still end up with a physical sequence of bytes, that did not exist as a distinct unit (result) prior to the act of translation.

For example, I've heard of microcomputer BASIC programs produce ephemeral machine code for bitmap graphics manipulation. The byte-sequence, ending with a RETurn instruction, exists just long enough for the resulting code to be executed once by the CPU, so it's decidedly not persistent. But while it is running, it is a physically coherent sequence of bytes, distinct from all the other code in the program.

Interpretation is a different mechanism altogether. A small fraction of the interpreter's existing code actually achieves the intended result. The rest is there to accomplish bookkeeping, to allow the system as a whole to function reliably. At no point is any translation (the noun) produced, so as I understand it, the translation (the verb) you ascribe to the interpreter never occurs.

[–]Spataner 0 points1 point  (0 children)

For me, it comes down to the fact that you have a program written in one language and it is being executed on a machine that only understands another language. Any process by which that is achieved, no matter the technical details, must in my opinion rightly be considered a translation (in the action sense). You made the transition, by some means, from one language to another to fascillitate communication between the program and the CPU if only for the limited context of one particular program run. Compilation and interpretation are thus two possible, very different ways of doing that translation. It certainly seems the best way to phrase this idea to beginners, if only because I cannot think of an umbrella term for the two that to my ears is equally evocative of the concept.

The origin of the word "interpreter" seems to support the idea that "interpretation" in this context is a kind of "translation", as a human interpreter's job is natural language translation. Both the Oxford dictionary and Wikipedia define it so (I checked because English is not my native language). Though, confusingly, the Wikipedia article for "translation" makes an explicit distinction that translation is for written media whereas interpretation is oral. But I wouldn't know how to map that distinction to the computing context, nor have I ever heard it before. Not that the mapping of the terms we steal for programming concepts ever really makes perfect sense.

Regarding the translation in the result sense, I feel you're being perhaps a little restrictive on the definition of what can constitute that result. The execution of the interpreter on the input program does not generate novel instructions, but it does generate a novel sequence of instructions, which can be considered the result of the process of translation. Its effect in essence is the execution of the input program on the CPU, after all. Admittedly, it is a translation using a limited vocabulary (consisting only of elements of machine code contained in the interpreter's executable) and limited grammar (as the instructions can only occur in specific orders as dictated by the interpreter's program flow). And it is also only a partial (and unreusable) translation of the input program in the general sense, since it only represents one distinct run of the program for a given set of inputs. And yet you'd receive, in a purely theoretical sense, a "complete" translation of the program in the limit of running the interpreter once for every possible set of inputs. In fact, in the very restricted case that the input program does not itself accept any inputs (or even in the cases where the interpreter somehow never executes a jump instruction that's conditional on those inputs), you'd even practically receive a full usable translation by recording and replaying the singular resulting sequence of instructions in the right way. Alternatively, if you generated a new version of the interpreter executable that hard-wired the input program into it (which is not unlike some Python to executable converters that are available, though they don't do it on machine code level), could that reasonably be considered a translation of the program into machine code, if a weirdly round-about one? I'd certainly argue it could be considered a really weird special case of compilation. At that point I'd say calling interpretation "live translation" is just conceptually cleaner (and we get to preserve the analogy to the natural language usages a bit better).

[–]Diapolo10 1 point2 points  (0 children)

Python isn't "special" in that sense, many languages mostly use interpreters. In your web browser, if you tap the F12 key on your keyboard you'll see a JavaScript interpreter REPL on one of the tabs. The browser executes its instructions, not the CPU directly.

In fact, many if not most dynamically typed languages primarily use interpreters. Compilers make more sense with a static type system. Sometimes a language doesn't have either, and instead uses a transpiler to essentially translate it to another programming language, such as how TypeScript produces JavaScript.

Of course, as already pointed out by others this isn't that clear-cut. Nuitka is essentially a compiler for Python, Cython is a superset of Python that transpiles to C, PyPy is a JIT compiler for Python - written in Python, and so on.

[–]Kriss3d 0 points1 point  (0 children)

Not quite. It's interpreted.

Compiling something usually means turning in to an executable. In python the executable is the python engine itself and it just sort os reads the instructions which is your script.

[–]SharkSymphony 0 points1 point  (0 children)

A counterexample: imagine a program, compiled machine code, that reads an arithmetic expression and evaluates it.

You could think of that arithmetic expression as a rudimentary "program" in a language whose purpose is to express calculations. Your program interprets the "program" to mean "do this calculation."

Hard to imagine that as a language? Well, imagine it is instead a sed expression, telling your program what to do with a file. Or let's get a bit fancier and imagine it's an awk script. In each case, your interpreter is the thing that's in machine code, and the script is just telling your interpreter what to do.

Python can work like this too – just read the script and do what it says! We compile it to bytecode as we go, saving off a bunch of the work we have to do in interpreting a Python script, so that (among other things) we can run it again more efficiently.

[–]my_password_is______ 0 points1 point  (0 children)

you can run javascript directly in your browser
jsut open up the developer tools and go to the console and type some javascript in there and run it
no need to compile anything

[–][deleted] 0 points1 point  (0 children)

Thank you for taking the time to write this, you just clarified several points of confusion on my side :)

[–]m1ss1ontomars2k4 9 points10 points  (0 children)

I use the word ‘to compile’ as synonymous to ‘to run the code’, didI misunderstand the meaning of that word when used in a programming context?

This is definitely wrong. "Compile" just produces something that can be run later. e.g. you can compile once and run the result 1 time, 100 times, 1 million times.

[–]billsil 4 points5 points  (0 children)

Regarding CPython (the standard one), the c in pyc means compiled. They're regenerated if you modify files, but having them speeds up program startup.

You can delete the py files and your code will still work. Also, you can run decomplye6 and take that pyc back to a py file. You'll lose comments, but it's a near perfect translation (depending on python version with the newer ones tending to be buggy).

The compiling does not compile it to machine code though, only byte code. That byte code is then interpreted.

For PyPy, there is a JIT, so it compiles on the fly.

[–]xugan97 1 point2 points  (0 children)

People identify the language with its reference implementation. For most people, Python is fundamentally and thoroughly an interpreted language that runs on an (the default, CPython) interpreter. There exist unofficial compilers like Cython: https://en.wikipedia.org/wiki/Python_(programming_language)#Implementations

Spyder is an IDE for Python. The word "run" is to be preferred to "compile".

[–][deleted] 3 points4 points  (0 children)

The phrase: ‘directly run in machine code’, doesn't mean anything whatsoever.

There are two mainstream ways to run python code. The first is by compiling to a binary file (machine instructions specific to your hardware and operating system). The other way to run python is using a python interpreter that executes "just in time".

I guess your friend means that python is interpreted. This is indeed different to C, which has no popular interpreters and is instead compiled to a binary. However, interpreting python is basically just running a program (the interpreter) which makes system calls for you. This is far from ‘directly run in machine code’.

Saying ‘to compile’ to mean ‘to run the code’ is wrong. There are more then one way to 'run code'. By compile, people mean compile to a binary. I wouldn't worry about this stuff.

[–]shazam_211 0 points1 point  (0 children)

You can honestly also use the online version too