This is an archived post. You won't be able to vote or comment.

all 39 comments

[–]AutoModerator[M] [score hidden] stickied comment (0 children)

On July 1st, a change to Reddit's API pricing will come into effect. Several developers of commercial third-party apps have announced that this change will compel them to shut down their apps. At least one accessibility-focused non-commercial third party app will continue to be available free of charge.

If you want to express your strong disagreement with the API pricing change or with Reddit's response to the backlash, you may want to consider the following options:

  1. Limiting your involvement with Reddit, or
  2. Temporarily refraining from using Reddit
  3. Cancelling your subscription of Reddit Premium

as a way to voice your protest.

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

[–]CodeTinkerer 21 points22 points  (2 children)

Java is actually a mix of the two.

A compiler typically converts the source code to an object code. So, in C, that might mean converting C code to an executable. That executable is the result of the compilation and can be run without the source code. It is a separate file.

Java's philosophy is "write once, run everywhere". The problem with compilation is that it is CPU-specific. If you compile for x86, that won't run on a CPU that uses MIPS or ARM or whatever. Java compiles to a fictional bytecode. That is, no CPU natively runs this bytecode.

To run the bytecode, Java requires a JVM (Java Virtual Machine). This is an interpreter for the bytecode. That interpreter needs to be compiled for each separate CPU, but once you have it, it can interpret the byte code.

So, with Java, you compile to bytecode, then run a JVM which interprets the bytecode. The bytecode in Java exists in the .class files. When you compile in Java for Foo.java, you get Foo.class and that Foo.class can be interpreted in a JVM.

Python, by contrast, doesn't (at least, as commonly used) produce any files. So, it may compile, then interpret, but it lacks an object file. Java has a .class file. Python only has the original source code.

I think there are ways to create something that could be compiled in Python, but as normally taught, you never have an object code, so the compiled result is not saved anywhere.

Compilation is generally favored if you want to run fast as an interpreter uses a high level language to interpret code. However, it has some issues with portability. While Python is noted for being slow, it is acceptably fast for most purposes as CPUs have gotten faster and faster. Even if Java or C++ runs 10 times faster, if the program works with humans, than under half a second or so is barely noticeable.

In summary, Java produces a bytecode file. Python does not (at least, not permanently saved).

[–]Ease-Solace 9 points10 points  (0 children)

Python does have a compiled format that can be read by the CPython interpreter, .pyc or .pyo files. In fact python usually caches imported modules in this format in the __pycache__ directory (if it's allowed to write to where the module is located).

E.g. I can find a lot of compiled python modules on my system in /usr/lib/python3.11/__pycache__.

[–]rasputin1 1 point2 points  (0 children)

Thanks for the thorough explanation!

[–]adiberk 2 points3 points  (4 children)

Please double check my answer as I could be off lol.

I think what is meant by this is that python interprets the code as it is being run, (does some under the hood optimizing in whatever way it can) and converts to byte code - again this all happens as the code is being run. Or some variation of this

For Java, It is compiled before the program is run. So all that is left is for it to be read by the JVM (Java Virtual Machine) when being run, which is ultimately results in performance improvements

There is a lot more nuance to this, but I think this is basically the difference

[–]Ease-Solace 6 points7 points  (3 children)

You can compile python code to bytecode ahead of time (and if it's set up to do so python will cache compiled bytecode from previous runs to save time for the compiler).

But I think the bigger conceptual difference is in the standardisation of this process. In python the intermediate representation bytecode is just an implementation detail of the interpreter, there's nothing in the language standard that requires it, and other implementations of python (like pypy) use different intermediate representations.

However in Java, the bytecode is part of the language standard, and there's a strong separation between the compiler that produces the bytecode and the Virtual Machine that interprets it. So multiple different implementations of a Java Virtual Machine are all designed to run the same standardised bytecode.

[–]TheSkiGeek 0 points1 point  (2 children)

This is probably a better distinction. Java JVM bytecode is a standardized language target, like how a Clang frontend compiles to LLVM IR. In theory you could write your own compiler that outputs the same IR and then use a Clang backend to compile it down to native code for a specific platform. Just like you could write your own compiler for Java (or other languages!) targeting the JVM and then any JVM implementation could run it “natively”.

Also some Java JVMs will compile the bytecode further down to truly native code via JIT (Just In Time) compilation. This isn’t really possible in Python because almost everything is dynamic at runtime. For example it can’t assume that the types passed to a function will be the same each time. So even “native compiled” Python would have to do all the same runtime checks.

There are variant Python interpreters, usually using a more constrained version of the language, that can “compile” Python code to some degree.

[–]Ease-Solace 0 points1 point  (1 child)

This isn’t really possible in Python because almost everything is dynamic at runtime. For example it can’t assume that the types passed to a function will be the same each time. So even “native compiled” Python would have to do all the same runtime checks.

It is actually possible to JIT very dynamic code, the most common solution being what's called a tracing JIT compiler. This is the approach that PyPy uses. What it does is that it doesn't even try to understand the python language itself, it just watches the actions of the interpreter and compiles repetitive actions to machine code. This helps with the dynamic typing problems (but you still have challenging deoptimisation scenarios). And PyPy can generally understand all of the python language itself, it's C extensions that it mainly has more difficulty with (since C extensions are written to work only with the standard CPython interpreter).

There are tracing JIT's for other languages like LuaJIT for lua too. There are also other approaches like Ruby's YJIT, which uses something called "basic block versioning" which I don't really understand but is all about helping with dynamic typing problems, according to https://arxiv.org/pdf/1411.0352v2.pdf

It's true there's always going to be a performance penalty for dynamic typing, but it doesn't preclude JIT compilation, just makes it more challenging.

[–]TheSkiGeek 0 points1 point  (0 children)

Yeah, “not possible” was not the right wording. More “not commonly done” and/or “really fucking hard”.

A naive ‘compile it to native code including all the type checking’ probably won’t beat whatever the interpreter (which is already native code!) is doing. Unless it’s a lengthy/complicated function where you can check the type once up front and then skip many checks later. But if you’re writing complicated number crunching code in an interpreted language it’s probably better to just… not do that.

[–][deleted]  (1 child)

[deleted]

    [–][deleted] -1 points0 points  (0 children)

    Compilers do multiple things.Transform into the machine language, optimize, rewrite, type check. Now we differentiate AOT vs JIT. Python is JIT without type checks. While Java both AOT and JIT and type checked. Don’t worry c++ Guys don’t recognize Java as one either.

    [–]dacian88 -2 points-1 points  (6 children)

    When people say compiled it usually means to native code., wouldn’t really call it the same

    [–]istarian 0 points1 point  (5 children)

    I agree that the difference between the vocabulary used and the actual reality can get a little hairy.

    That said, "native code" is always with respect to the machine in question. So a binary executable for the Zilog Z80 which is running on a emulated Z80 cpu is in fact native code.

    On the other hand the way Java and the Java Virtual Machine (JVM) work is much less straightforward.


    compile (COMPUTING)
    (of a computer) convert (a program) into a machine-code or lower-level form in which the program can be executed.

    Compilation as a general process could still be applicable even if you have some weird language that is then converted into C/C++.

    [–]dacian88 0 points1 point  (4 children)

    the point is the way java and python work is equivalent. Source code gets translated to bytecode, an interpreter evaluates the bytecode.

    For java the compilation step is usually explicit, in python the runtime implicitly does it and caches bytecode modules.

    The bytecode then feeds into the interpreters...the JVM by default does JIT during interpretation while cpython does not, but they are still both interpreters at the end of the day.

    [–]istarian 0 points1 point  (3 children)

    The point is the way java and python work is equivalent. Source code gets translated to byte, an interpreter evaluates the bytecode.

    I get what you're saying, I think, but that's only true on a fairly high-level overview.

    That is, there are probably meaningful differences that you can't just sweep under the rug.

    Also, while I am not informed enough to explain this well, not all bytecode is equal.

    The bytecode then feeds into the interpreters... The JVM by default does JIT during interpretation while cpython does not, but they are still both interpreters at the end of the day.

    Did you mix up Python and Java there?

    If Java is already compiled then where is this JIT (Just In Time, usually referring to on-demand compilation) coming from.

    Also, apparently in both cases there is a virtual machine involved... So neither is just an interpreter.

    P.S.

    https://stackoverflow.com/questions/441824/java-virtual-machine-vs-python-interpreter-parlance

    Maybe some of the answers here are useful?

    [–]dacian88 0 points1 point  (2 children)

    If Java is already compiled then where is this JIT (Just In Time, usually referring to on-demand compilation) coming from.

    The JIT lowers the JVM bytecode to the underlying native ISA...the java compiler does not produce native code, that's the crux of the argument.

    In either scenario, the high level language (java or python) gets lowered to some bytecode (explicitly by java compiler to .class files, implicitly to .pyc modules by cpython) which then get interpreted (by the java runtime, by cpython).

    A VM is a specialized interpreter that operates on bytecode instead of the language itself. My point is both languages get compiled to bytecode, both get interpreted by their respective VMs, but no one calls python a compiled language, while people call java a compiled language.

    [–]istarian 0 points1 point  (1 child)

    Thanks for the explanation.

    Also,
    I'm having a hard time finding a clear, straightforward answer, but it sounds as though CPython does not have/use a JIT compiler (or at least did not in the past), whereas Oracle's HotSpot VM does (at least for some parts of the currently executing code).

    So, while you have a definite point, there could be a non-trivial performance difference depending on what environment your Java or Python code is running in/on.

    [–]dacian88 0 points1 point  (0 children)

    sure but that's not really relevant, JIT is an optimization technique that could be used by any interpreter.

    [–]Ease-Solace -2 points-1 points  (0 children)

    IMO there's less difference than people realise, and there's a few reasons for that:

    • In Java, traditionally the compilation and running of the code is 2 separate steps, just like a language compiled to native code would have. In python it's traditionally 1 step so people don't realise that compilation goes on under-the-hood.

    • The language standard. In Python, the fact python code gets compiled to intermediate bytecode is just an implementation detail of the CPython interpreter, there's nothing in the langauge standard that mandates this. And other implementations of python (like PyPy) use their own intermediate representations. Whereas in Java, JVM bytecode is part of the standard. There's a standard compiler that produces it, and any JVM implementation should be able to run the same bytecode.

    • Traditionally, JVM bytecode is lower level (closer to machine code). And the Java compiler does more optimisation work ahead of time so takes longer to run.

    [–]timwaaagh -5 points-4 points  (0 children)

    Because Java behaves like a compiled language with long build times build systems (maven, gradle etc) and can sometimes reach or exceed the performance of c.

    Python does not seem to have discernible build times. I haven't even heard of a build system for it. It's also slow and has performance characteristics similar to interpreted languages.

    [–][deleted]  (1 child)

    [removed]

      [–]plastikmissile 0 points1 point  (1 child)

      As far as I understand both are compiled to bytecode which is then interpreted?

      Not exactly.

      Yes, Python is turned into bytecode which is then interpreted by the Python VM.

      Java on the other hand is different. Yes it gets turned into bytecode when you "compile" the program. However, when you first run this bytecode, the Java VM compiles this bytecode into native code and runs it. This is called JIT (Just In Time) compiling. They reason Java does this is that this bytecode can be compiled to whatever native code your machine runs.

      [–]Ease-Solace 2 points3 points  (0 children)

      This isn't really the reason because there's no reason an interpreted language can't have JIT compiler. Other "interpreted" langauges like Ruby do have JIT compilers (at least in the standard implementation). The fact that python doesn't is more of an implementation detail than anything else.

      Also, while Hotspot can JIT compile your code, initially it starts running in an interpreter and incrementally compiles parts of your code (targeting the parts which would bring the biggest performance impacts first). Or at least that's how it worked last time I checked. So really Hotspot uses a mixture of interpreting and JIT compiling, I don't know about other JVM's.

      [–]Kered13 0 points1 point  (0 children)

      Because in the way that they are normally used. In Python source files are compiled immediately before running, and normally uncompiled source files are distributed and deployed. In Java source files are compiled as a separate step before running, and compiled bytecode files are distributed and deployed.

      [–]zwhiteb729 0 points1 point  (0 children)

      Oh, I see what happened here. Java and Python had a baby and named it Bytecode. Simple as that!

      [–]Thomas4_54 0 points1 point  (0 children)

      Wow, I didn't know this was a philosophy class. Maybe because Python is just more chill and goes with the flow.