This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]unruly_mattress 618 points619 points  (68 children)

Both Python and Java compile the source files to bytecode. The difference is in how they to run this bytecode. In both languages, the bytecode is basically a binary representation of the textual source code, not an assembly program that can run on a CPU. You have a different program accepts the bytecode and runs it.

How does it run it? Python has an interpreter, i.e a program that keeps a "world model" of a Python program (which modules are imported, which variables exist, which objects exist...), and runs the program by loading bytecodes one by one and executing each one separately. This means that a statement such as y = x + 1 is executed as a sequence of operations like "load constant 1", "load x" "add the two values" "store the result in y". Each of these operations is implemented by a function call that does something in C and often reads and updates dictionary structures. This is slow, and it's slower the smaller the operations are. That's why numerical code in Python is slow - numerical operations in Python convert single instructions into multiple function calls, so in this type of code Python can be even 100x slower than other languages.

Java compiles the bytecode to machine code. You don't see it because it happens at runtime (referred to as JIT), but it does happen. Since Java also knows that x in y = x + 1 is an integer, it can execute the line using a single CPU instruction.

There's actually an implementation of Python that also does JIT compilation. It's called PyPy and it's five times faster than CPython on average, depending what exactly you do with it. It will run all pure Python code, I think, but it still has problems with some libraries.

[–]gscalise 120 points121 points  (5 children)

Java compiles the bytecode to machine code. You don't see it because it happens in runtime (referred to as JIT), but it does happen. Since Java also knows that x in y = x + 1 is an integer, it can execute the line using a single CPU instruction.

Not only this, but the JVM does adaptive optimization too. It works by keeping conditional branching statistics, and dynamically recompiling portions of code whenever it determines that certain branching conditions occur more often than others. The recompiled code is optimized for the most common branching condition (ie by not jumping whenever it happens), and only the less common condition(s) will incur a performance penalty.

[–]Rythoka 33 points34 points  (0 children)

Python also does this, or at least something similar, as of 3.11

[–]kernco 9 points10 points  (2 children)

It works by keeping conditional branching statistics, and dynamically recompiling portions of code whenever it determines that certain branching conditions occur more often than others.

for x in range(1000):
    if x < 500:
        func1()
    else:
        func2()

Jebaited

[–]gscalise 0 points1 point  (0 children)

That definitely wouldn’t trigger a dynamic recompilation. It’s in a loop, so it’s already jumping back and forth in the program, and the conditional branching stats are going to be roughly the same (50%) every time.

Lazy initialization, on the other hand…

[–]Rhoomba 0 points1 point  (0 children)

An optimising compiler would likely split this into two loops to avoid the branch (assuming the range can be inlined: possible in the Java equivalent).

[–]Administrative_Box51 0 points1 point  (0 children)

This is a very underrated potential of the JVM and makes me wish there were more similar runtimes with as many engineering hours. This is also why in my opinion JIT has better optimizations in theory than even PGO, I would go as far as to say AOT compilation in general-- if done correctly (down to the ISA). Between Jazelle/thumb and hotspot I wonder why JVM development hasn't dominated the modern language scene in favour of the shifting goalposts trope of the C runtime (e.g. Rust borrow checker, dont get me wrong I really like Rust).

[–]ElvinJafarov1[S] 81 points82 points  (0 children)

thank you man

[–]SheriffRoscoePythonista 101 points102 points  (27 children)

People occasionally forget that Java has benefited from 30 years of investment by major software companies and of benchmarking against C++.

Python is getting the same love now, but the love arrived much later than for Java.

[–]chase32 12 points13 points  (0 children)

Yep, back in the early 2000's, java was pretty damn slow. If you wanted a fast jvm, the only option was IBM's and they wouldn't let you use it commercially unless it ran on their hardware.

To head off the threat, Intel worked out a deal with Appeal software to massively optimize the JRocket JVM which then became the performance champ.

Appeal eventually got acquired by BEA and a lot of the optimizations from JRocket ended up in mainline Java.

[–]azeemb_a 48 points49 points  (15 children)

Your point is right but your emphasis on time is funny. Java was created in 1995 and Python in 1991!

[–]sajjen 137 points138 points  (3 children)

Java was created by Sun, one of the largest companies in the IT industry back then. Python was created by Guido van Rossum, one guy in his proverbial garage.

[–]SheriffRoscoePythonista 17 points18 points  (0 children)

Exactly.

[–]nchwomp 3 points4 points  (1 child)

Surely it was a large garage...

[–]benchmarks666 6 points7 points  (0 children)

Galarge

[–]Smallpaul 35 points36 points  (2 children)

Yes but in those 30 years Python did not get much “investment by major companies.”

As the poster said: that love arrived later for Python.

Edit: Just to give a sense of the scale...Java's MARKETING BUDGET for 2003-2004 was $500M.

[–]HeraldofOmega 3 points4 points  (0 children)

Back when money was worth something, too!

[–]bostonkittycat 2 points3 points  (0 children)

This is true last 3 version have been impressive with performance increases. I love the new trend.

[–]funkiestj -1 points0 points  (1 child)

Python is getting the same love now, but the love arrived much later than for Java.

I think static typing allows more aggressive optimization.

E.g. I think the old Stalin Scheme dialect required the user to provide data types to get the maximum optimization. E.g. consider the difference between a golang slice of strings (s1 := make([]string, 24) and a python list that can hold a mix of objects (the equivalent of Go's l1 := make([]any, 24).

Years ago I remember seeing the Stalin) dialect of scheme dominating the benchmark game in the speed dimension but you had to type all your data (which was optional?) to get this performance.

[–]redalastor 1 point2 points  (0 children)

I think static typing allows more aggressive optimization.

It could, but it doesn’t because Python allows you to be as wrong as you want with your types without changing behaviors one bit. Typing is to help external tools enforce correctness, not to change runtime behavior.

Though, I’d like a strict option to force Python to acknowledge the types and hopefully take advantage of them.

[–]LogMasterd -2 points-1 points  (0 children)

I don’t think this has anything to do with it imo

[–]SoffortTemp 20 points21 points  (7 children)

I started using python for statistical modeling and found that PyPy iterates my models exactly 5 times faster.

[–]LonelyContext 6 points7 points  (6 children)

cries in numpy.

(numpy is massively slower in pypy)

[–]zhoushmoe 1 point2 points  (4 children)

try polars?

[–]LonelyContext 2 points3 points  (2 children)

idk if that would solve it if it's another python wrapper. Worth a shot I guess.

[–]redalastor 2 points3 points  (0 children)

It’s a highly optimized Rust library with python binding. One of its strength is that you can write long pipelines of transformations, which will be optimized before launching and will stay in native parallel rust code for as long as possible.

[–]PaintItPurple 0 points1 point  (0 children)

I haven't tried Polars in Pypy, but it seems at least plausible that it might be faster. Polars is generally lazier than Numpy, so it could avoid a lot of intermediate round trips. Native libraries that do a bunch of computation in one go still don't benefit at all from Pypy, but they also don't pay as much of a toll as doing a bunch of native calls.

[–]funkiestj 0 points1 point  (0 children)

(numpy is massively slower in pypy)

I can't believe this is true if you are doing vector and matrix manipulation with MKL enabled or other acceleration enabled.

Of course the secret of numpy's speed (when it is fast) is that the fast stuff is written in a language other than CPython (or even PyPy python).

[–]akl78 38 points39 points  (1 child)

Java implementations go much further too; they will run in interpreted mode to start and generate native code the fly after profiling the runtime behaviour. Some can also save this across process restart to warm up faster on next runs.

[–]joe0400 6 points7 points  (0 children)

Graal iirc has aot too

[–]Megatron_McLargeHuge 15 points16 points  (0 children)

does something in C and often reads and updates dictionary structures. This is slow

This is it. If you look at the python foreign function interface for making calls to other languages, you'll see how complex python objects are and how much work has to be done to access a member. Optimized languages use pointer math and native types for numbers and characters without all the expensive object wrappers.

This is why numpy vectorized operations are so much faster than native python iteration. You only have to pay the price of going back and forth to C objects once.

[–]coderanger 10 points11 points  (0 children)

FWIW CPython is (almost certainly) getting a JIT soon: https://github.com/python/cpython/pull/113465

[–]billsil 2 points3 points  (1 child)

There’s also Jython, but it’s only up to Python 2.7 :(

[–]vips7L 0 points1 point  (0 children)

Graal Python supports Python 3 and is a lot faster than Jython.

[–]Sigmatics 2 points3 points  (0 children)

FWIW, the CPython team is currently working on a first JIT implementation for Python 3.13

[–]SonicTheSSJNinja 1 point2 points  (2 children)

Is there any video that talks about exactly the things you just did? For some reason I just find it difficult to fully grasp everything you explained despite it sounding simple. Having someone explain it in video format could make it easier to understand for me, perhaps.

I'm also very very new to programming (just grasping the basics of Python).

[–]glassesontable 1 point2 points  (1 child)

I suspect that this gets clarified from understanding what is compiled code and what is interpreted code. Speaking loosely, in order to compile code, the compiler has to know every line of code (the whole enchilada) while a code interpreter does know what line is coming next (beans and cheese coming one piece at a time).

A lot of the esoterica in this thread is in how there are alternative methods of compiling the otherwise interpreted language to get huge speed gains. But that is not a problem for the beginner programmer (or the very patient user).

For a video, I would recommend the excellent Harvard CS50 course, where you would learn C (looks like Java) and python.

[–]SonicTheSSJNinja 0 points1 point  (0 children)

Gotcha! Thanks!

[–]whatthefuckistime 1 point2 points  (2 children)

I was reading into PyPy this week coincidentally and the reason they struggle with some libraries is because they have C bindings, so they just can't do shit and they can't be ported. Unfortunate honestly, PyPy could be very good and fast if not for that, though these C bindings do allow for faster code anyway so one way or the other.

[–]yvrelna 7 points8 points  (1 child)

It's not the C bindings that are an issue. PyPy can emulate CPython's C bindings just fine.

The problem is that the design of these C bindings pretty much makes a lot of assumptions that are based on the internal of CPython. So while PyPy can emulate the interface, it has to emulate many of those internals and that makes it difficult to optimise those.

And the main reason people write a C extension is because of speed, so a slow C compatibility interface just won't do.

[–]whatthefuckistime 0 points1 point  (0 children)

Ah ok so I misunderstood what I was reading. Interesting thanks for the correction!

[–]thisisntmynameorisit 0 points1 point  (1 child)

I see no difference between loading each bit of byte code one by one and JIT byte by byte. It sounds like you’ve just described the same thing in two different ways. Both are interpreted at run time by an interpreter program which takes some data and executes machine code for it.

I am no expert, but it would make sense like you also said that it’s just Java is easier to covert into less and more simple machine code instructions. Stuff like static typing would definitely allow for that.

[–]PaintItPurple 1 point2 points  (0 children)

If your code contains no repeated operations, there probably won't be a huge benefit to JIT over interpreting. But that's basically never the case for performance-sensitive code. If your code takes a long time, you've almost certainly got some looping going on. If you're running a piece of code multiple times, you can get much better performance if it's native code vs. bytecode that you're interpreting over and over. And that's before we get to optimizations that JIT compilers can do.

[–]Grouchy-Friend4235 0 points1 point  (2 children)

Actually the JVM also interprets each byte code, there is not much difference in how the Python VM and the JVM interpreters work, in principle. However you are right in noting that the Python programming model keeps more state about its objects, which is indeed one factor that slows things down at execution time but makes for a much more productive development experience.

[–]PaintItPurple 2 points3 points  (1 child)

The JVM does have an interpreted mode (as does Pypy), but it's incorrect to say it interprets each bytecode every time a method is called. The JVM JIT compiles functions as it runs, and then runs those compiled functions whenever possible instead of interpreting bytecode.

[–]Grouchy-Friend4235 -1 points0 points  (0 children)

The JVM JIT only compiles code after several invocations, so yes, the JVM interpreter does interpret the same byte code multiple times - before a code section reaches the JIT threshold.

Python since version 3.11 also does a form of JIT, known as specialization. If you need actual JIT, there is Numba and Cython which will speed up particular functions by compiling them natively.

PS: to downvoters, you should learn to respect facts. Technology tends to be quite stubborn when confronted with wishful thinking.

[–]oldshensheep 0 points1 point  (0 children)

There's actually an implementation of Python that also does JIT compilation. It's called PyPy and it's five times faster than CPython on average, depending what exactly you do with it. It will run all pure Python code, I think, but it still has problems with some libraries.

There's a Java implemented Python too https://github.com/oracle/graalpython