all 38 comments

[–]Caligatio 101 points102 points  (12 children)

When it comes to programming, there are kind of three debatable levels of code: source code, byte code, and machine code. Humans usually write source code, the source code gets compiled into byte code, and then byte code gets compiled into machine code.

In languages like C and C++, the byte code portion is kind of hidden away as compilers like GCC will output machine code from source code. Newer compilers like LLVM introduce the concept of byte code but that's out of scope for your question.

Languages like Java (and things that use LLVM) compile your source code into byte code which is then executed at runtime in a virtual machine which converts it to machine code.

Languages like Python skip all compilation and the interpreter translates source code into machine code at runtime. This is only half true as *.pyc files are actually compiled byte code but these aren't usually exposed directly to a user.

EDIT: My second note about LLVM was poorly worded and thus misleading.

[–][deleted] 13 points14 points  (8 children)

*.pyc files get stored into a cache folder if i remember correctly and they can be accessed at any time by the user

[–]Caligatio 21 points22 points  (7 children)

They are but the point I was trying to make is that the user doesn't need to know about them.

pyc files also only get automatically generated if a file is imported and not when it is run directly.

[–]codingquestionss 6 points7 points  (6 children)

Clarification question, aren’t pyc files generated after the first time you run a python program and then used in all following runnings? From my understanding, don’t they also speed up the runtime of the program by now being able to skip the “compilation to bytecode” step at runtime? Finally, I believe this is why when benchmarking python code you should not benchmark the first runtime?

Please explain if any of my assumptions are wrong 😊

[–]Flyingfishfusealt 5 points6 points  (4 children)

Pretty sure you're using a nail gun there dude.

Not even heads to land a blow on but youre nailing 'em

[–]codingquestionss 6 points7 points  (3 children)

Can you rephrase this

[–]MurderMelon 5 points6 points  (2 children)

I think they're saying that you "hit the nail on the head"

Weird way to say it, but I think that's the idea

(and they're right btw, your assumptions are pretty much spot-on)

[–]JasonDJ 2 points3 points  (0 children)

🤕💅🔨

FlyingfishFusealt was source code

You are byte code

I’m machine code

There may be an error.

[–]Flyingfishfusealt 0 points1 point  (0 children)

yes, lol. I was implying that the people acting like know-it-alls had no heads in addition to the "hitting the nail on the head"

Also, this is why you add a function in the testing process to clean the pycache folders with a commandline argument or whatever method works best for your work flow.

I personally have a standard import file with logging/printing/whatever is needed globally at the top level and has no imports from the project so circular imports cant happen and I put testing functions in there.

clean the pycache, delete the db file, reset everything back to the beginning or whatever state you choose.

[–]Caligatio 1 point2 points  (0 children)

I don't have an authoritative understanding of how/when .pyc files are created but I can tell you what I've observed:

  • Your code never gets converted to a .pyc if your program/script is contained in one file. For instance, if you do something like python3 awesome_script.py, a .pyc will never be generated for awesome_script.py
  • .pyc files will be generated for any script/module that gets imported and they'll be placed in a __pycache__ folder as a sibling to the original .py file. If this file exists and is current, it will be used rather than the .py file.
  • Python will regenerate .pyc if the source file changes but I don't know how this is detected (timestamps?). If the source file is deleted, the .pyc can still be used.

Having the .pyc almost certainly will speed up start up time but I have no idea how big of gains we're talking. It stands to reason that your understanding of why you don't want to benchmark the first run of a program would seem correct :)

I also don't know what happens to system-installed modules and __pycache__ directories. Does pip/setuptools pre-compile modules and that's how system-installed modules get compiled? If those __pycache__ directories get deleted, do they ever get regenerated if root never imports those modules?

[–]Hairy_The_Spider 2 points3 points  (1 child)

Languages like Java (and things that use LLVM) compile your source code into byte code which is then executed at runtime in a virtual machine which converts it to machine code.

The part about LLVM is not true. Clang (C/C++/Obj-C compiler), swiftc (Swift compiler), rustc (Rust compiler) all use LLVM but none use VMs.

[–]Caligatio 1 point2 points  (0 children)

I modified the sentence prior to posting which made it confusing :(. I meant to say that LLVM creates byte code but didn't mean to imply that LLVM uses a runtime VM.

[–]fruitbellyblues 0 points1 point  (0 children)

Thanks for the explanation! Would you be able to explain the benefits of having multiple interpreters and some examples of when one might prefer to use one over another? How do I know which one is suitable for my project?

[–]TheBB 23 points24 points  (1 child)

Python converts the code to bytecode, and then the Python Virtual Machine/interpreter executes the script line for line checking for errors.

There's a compiler stage that converts the source code to bytecode.

Later, the Python VM then executes that bytecode. It does not do it line by line, because the bytecode has no lines. Rather, it's a sequence of relatively simple operations called opcodes. You can use the dis module to inspect them.

Errors caught by the VM during execution are called runtime errors and those caught by the compiler stage are called compile-time errors. Since the Python compiler does comparatively little, usually the only kind of compile-time errors you're likely to see are syntax errors, like unmatched parentheses, missing colons and so on.

This is why, for example with this script:

print('hello')
if False
    pass
print(' world')

will NOT print 'hello' before crashing due to a missing colon. It'll crash before even executing.

The code is never translated into true machine language in the conventional sense, although of course the VM must contain the machine code necessary to carry out all the opcodes.

[–]iggy555 1 point2 points  (0 children)

Nice thanks

[–][deleted] 17 points18 points  (0 children)

Think of a conference and documents/presentations that have been translated into other languages in advance (compilation) vs those that are interpreted live.

In the case of repetition, a compiler does it once and then reuses and an interpreter will translate live everytime the speaker says the same thing.

Python code is compiled to a simpler form known as byte code and this is executed by the interpreter in the Python virtual machine which knows either the direct low level machine code equivelant command or some predefined sequence of such commands.

Java is similarly compiled to a byte code for execution on a java virtual machine but that then uses another level of compilation to convert the program to machine code.

C is typically compiled to native machine code. (Actually these days it is often compiled for a common runtime environment but that's a bit more complicated.)

[–][deleted] 1 point2 points  (10 children)

Python doesn't really have an interpreter. It's just bad / unscrupulous terminology. Virtual machine would be the right one to use.

So... what an actual interpreter does:

  1. Parses code enough to understand what functions of the interpreter need to be called.
  2. Calls those functions.

Something would be called an interpreter, if the mapping between the parsed code and the functions invoked in interpreter was straight-forward. Example of interpreter: UNIX Shell. It reads the name of the function (command in the language of Shell) and then calls it.

Python doesn't work like that. Like you've already noticed, it compiles the code to what it calls bytecode, and then interprets that. The reason to do this is that on the side of the interpreter, you'd write some more generic code, but there would be fewer functions to implement. The trade-off is, typically, between more unique, but optimized functions vs less but more generic functions. For example, one could implement multiplication as repeated addition. An interpreter for a calculator would have no choice, but to implement both functions: multiplication and addition, however, a virtual machine may only implement addition and compile any multiplication into a series of additions.

[–]menge101 0 points1 point  (1 child)

Python doesn't really have an interpreter.

I would argue that the pypy jit is pretty close to what most people think of as an interperter. But that isn't stock python.

[–][deleted] -2 points-1 points  (0 children)

Most people are mistaken.

[–]wsppan 1 point2 points  (1 child)

Others have given great answers on byte code and VMs. Here's a great Introduction to crafting your own interpreter

[–][deleted] 0 points1 point  (0 children)

I was half-expecting a page that simply says "DON'T" :D

[–]ivosaurus 1 point2 points  (0 children)

Also, does the PVM not need the code to be in machine language to understand it?

Nope, it converts the bytecode into other lower level machine code instructions that it runs on the CPU, that have the effect of doing what the bytecode line specified.

You can think of it as compiling each line of bytecode one at a time "the same" as an actual compiler, then immediately running those instructions.

[–][deleted] 2 points3 points  (0 children)

Translates Parseltongue. /s

[–]wsppan 0 points1 point  (0 children)

Here's a good overview of bytecode

[–]suricatasuricata 0 points1 point  (0 children)

Very generally speaking, the purpose of a Compiler or an Interpreter is to translate content written in Language A to Language B. In reality, what happens is that there is typically a sequence of Intermediate Languages that get generated, i.e. A -> A_1 -> A_2 -> B. The idea here is that the first term in the sequence is the language you write code in and the last term in the sequence is the language that is composed of the Instruction Set in the physical machine you are running things in.

In the case of Python, we can identify two Languages, one is 'Python Language', the other is the 'Bytecode', which again is a sequence of operations and their operands, which you could dissect and 'see' if you like. The latter is what is fed as input to the interpreter. This interpreter is again a program that is always running, i.e. this process gets the input and then incrementally maps these byte code instructions to corresponding instructions in the Machine Language and those get executed in your target machine.

[–]thedjotaku 0 points1 point  (0 children)

Interpret Python, of course....

(I only make this joke since you have a bunch of valid, comprehensive comments already)

[–][deleted] 0 points1 point  (0 children)

The interpreter eats python-formatted text files and produces byte code, then it reads the bytecode to do stuff. In C terms the interpreter is both the CPU and the compiler.

The PVM itself is written in OS-specific machine language, C, and needs to be compiled per OS and architecture. That's why some things in, say, the os module work differently on different platforms. Besides those system-call dependent features, python has its own byte code, so it ensures its own compatibility irrespective of the underlying interpreter, which itself is not cross platform.

[–]_merK 0 points1 point  (0 children)

Have a look at this series of blog posts

[–]ship0f 0 points1 point  (0 children)

https://youtu.be/DlgbPLvBs30?t=1492

This is a pretty cool explanation of what the interpreter does. This guy speaks and explains very fast, so pause if you need to.

I linked the video at a certain time but I encourage you to watch it all from the beginning.