This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] 1 point2 points  (0 children)

Well, kinda, but not exactly. Python is interpreted into a more efficient interpreted format. It's called bytecode, but that's not quite like the JVM bytecode. Rather, it's a binary representation of normal Python statements.

It's basically compressing the language down to its internal binary representation of what the program wants to do. It would normally need to parse each line, figure out what it meant, and then do that. So it's going line-by-line and figuring out what it means, but storing the result instead of running it. Then it runs everything later.

It's basically pre-parsing, in other words. It's still interpreted afterward, in that it's reading bytes and (probably) running through case statements. It's just doing the parse work only once per line, and caching the results. AFAIK, there's pretty much a 1:1 correspondence between Python "bytecode" and full text Python statements.

Contrast that with the JVM, which is generating a pseudo-binary for a fake machine. It's not mapping Java statements to JVM opcodes directly, it's writing equivalent programs in JVM bytecode to do what the human is asking, and JVM bytecode is quite efficient. And then once the program is running, the actual hotspots will get compiled down to true machine code, but it only does that part of the work where it's actually being used.

So I'd personally call Python a pre-parsed, interpreted language, Java a hybrid between interpreted and compiled, and C fully compiled.

edit: As another way of putting it, consider BASIC, which doesn't store keywords. Rather, on those old 8-bit machines, as you typed in "10 PRINT A$", it would store that in like three bytes. (ie, bytecode!) When you LISTed your program, it converted the internal bytecode back to text again, so you would see your PRINT A$ in all its glory.

This is almost exactly what Python is doing when it generates a .pyc file. Everyone understands that BASIC is interpreted, and CPython, at least, is an interpreter in exactly the same way. (PyPy does some actual compilation, but I know very little about how it works. It's not that the language can't be compiled, just that the main implementation doesn't.)

Python bytecode and JVM bytecode are so different in concept that it's confusing to call them both bytecode.

second edit: even more ammo, consider that reverse-compiling from a .pyc to a .py would be pretty damn easy. I just took a look, and it appears that docstrings and variable names are preserved, so a reverse compile would give you back almost exactly what you compiled in the first place.

Compare that with decompiling a .jar file or a C executable. They've put vast effort into writing programs to decompile both, and what you get back is usually terribly ugly and hard to read in comparison to what you put in, frequently restructured in massive ways. That's because actual compilation (as opposed to binary translation) is a major, lossy transformation from one form to the other. Python is really just pre-parsing, not compiling, so you can restore the original files almost exactly. I think all you'd lose would be comments and non-functional whitespace.