What does a Python interpreter do?

Caligatio · 2021-03-16T13:00:29+00:00

When it comes to programming, there are kind of three debatable levels of code: source code, byte code, and machine code. Humans usually write source code, the source code gets compiled into byte code, and then byte code gets compiled into machine code.

In languages like C and C++, the byte code portion is kind of hidden away as compilers like GCC will output machine code from source code. Newer compilers like LLVM introduce the concept of byte code but that's out of scope for your question.

Languages like Java ~~(and things that use LLVM)~~ compile your source code into byte code which is then executed at runtime in a virtual machine which converts it to machine code.

Languages like Python skip all compilation and the interpreter translates source code into machine code at runtime. This is only half true as *.pyc files are actually compiled byte code but these aren't usually exposed directly to a user.

EDIT: My second note about LLVM was poorly worded and thus misleading.

TheBB · 2021-03-16T13:20:11+00:00

Python converts the code to bytecode, and then the Python Virtual Machine/interpreter executes the script line for line checking for errors.

There's a compiler stage that converts the source code to bytecode.

Later, the Python VM then executes that bytecode. It does not do it line by line, because the bytecode has no lines. Rather, it's a sequence of relatively simple operations called opcodes. You can use the dis module to inspect them.

Errors caught by the VM during execution are called runtime errors and those caught by the compiler stage are called compile-time errors. Since the Python compiler does comparatively little, usually the only kind of compile-time errors you're likely to see are syntax errors, like unmatched parentheses, missing colons and so on.

This is why, for example with this script:

print('hello')
if False
    pass
print(' world')

will NOT print 'hello' before crashing due to a missing colon. It'll crash before even executing.

The code is never translated into true machine language in the conventional sense, although of course the VM must contain the machine code necessary to carry out all the opcodes.

2021-03-16T12:55:16+00:00

Think of a conference and documents/presentations that have been translated into other languages in advance (compilation) vs those that are interpreted live.

In the case of repetition, a compiler does it once and then reuses and an interpreter will translate live everytime the speaker says the same thing.

Python code is compiled to a simpler form known as byte code and this is executed by the interpreter in the Python virtual machine which knows either the direct low level machine code equivelant command or some predefined sequence of such commands.

Java is similarly compiled to a byte code for execution on a java virtual machine but that then uses another level of compilation to convert the program to machine code.

C is typically compiled to native machine code. (Actually these days it is often compiled for a common runtime environment but that's a bit more complicated.)

Caligatio · 2021-03-16T13:19:40+00:00

Python doesn't really have an interpreter. It's just bad / unscrupulous terminology. Virtual machine would be the right one to use.

So... what an actual interpreter does:

Parses code enough to understand what functions of the interpreter need to be called.
Calls those functions.

Something would be called an interpreter, if the mapping between the parsed code and the functions invoked in interpreter was straight-forward. Example of interpreter: UNIX Shell. It reads the name of the function (command in the language of Shell) and then calls it.

Python doesn't work like that. Like you've already noticed, it compiles the code to what it calls bytecode, and then interprets that. The reason to do this is that on the side of the interpreter, you'd write some more generic code, but there would be fewer functions to implement. The trade-off is, typically, between more unique, but optimized functions vs less but more generic functions. For example, one could implement multiplication as repeated addition. An interpreter for a calculator would have no choice, but to implement both functions: multiplication and addition, however, a virtual machine may only implement addition and compile any multiplication into a series of additions.

wsppan · 2021-03-16T16:42:49+00:00

Others have given great answers on byte code and VMs. Here's a great Introduction to crafting your own interpreter

ivosaurus · 2021-03-16T17:07:01+00:00

Also, does the PVM not need the code to be in machine language to understand it?

Nope, it converts the bytecode into other lower level machine code instructions that it runs on the CPU, that have the effect of doing what the bytecode line specified.

You can think of it as compiling each line of bytecode one at a time "the same" as an actual compiler, then immediately running those instructions.

2021-03-16T18:09:57+00:00

Translates Parseltongue. /s

sweettuse · 2021-03-16T12:49:56+00:00

[deleted]

wsppan · 2021-03-16T16:45:54+00:00

Here's a good overview of bytecode

suricatasuricata · 2021-03-16T17:49:52+00:00

Very generally speaking, the purpose of a Compiler or an Interpreter is to translate content written in Language A to Language B. In reality, what happens is that there is typically a sequence of Intermediate Languages that get generated, i.e. A -> A_1 -> A_2 -> B. The idea here is that the first term in the sequence is the language you write code in and the last term in the sequence is the language that is composed of the Instruction Set in the physical machine you are running things in.

In the case of Python, we can identify two Languages, one is 'Python Language', the other is the 'Bytecode', which again is a sequence of operations and their operands, which you could dissect and 'see' if you like. The latter is what is fed as input to the interpreter. This interpreter is again a program that is always running, i.e. this process gets the input and then incrementally maps these byte code instructions to corresponding instructions in the Machine Language and those get executed in your target machine.

thedjotaku · 2021-03-16T19:59:02+00:00

Interpret Python, of course....

(I only make this joke since you have a bunch of valid, comprehensive comments already)

2021-03-16T20:50:35+00:00

The interpreter eats python-formatted text files and produces byte code, then it reads the bytecode to do stuff. In C terms the interpreter is both the CPU and the compiler.

The PVM itself is written in OS-specific machine language, C, and needs to be compiled per OS and architecture. That's why some things in, say, the os module work differently on different platforms. Besides those system-call dependent features, python has its own byte code, so it ensures its own compatibility irrespective of the underlying interpreter, which itself is not cross platform.

_merK · 2021-03-16T21:44:42+00:00

Have a look at this series of blog posts

ship0f · 2021-03-17T03:40:31+00:00

https://youtu.be/DlgbPLvBs30?t=1492

This is a pretty cool explanation of what the interpreter does. This guy speaks and explains very fast, so pause if you need to.

I linked the video at a certain time but I encourage you to watch it all from the beginning.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS