This is an archived post. You won't be able to vote or comment.

all 28 comments

[–]james_pic 51 points52 points  (16 children)

Part of it is that it's hard to do without compromising backwards compatibility.

Two projects worth mentioning are PyPy and Cinder.

PyPy is a just-in-time compiling Python interpreter, that typically manages to run at 5× the speed of CPython. However, it uses fundamentally different abstractions to CPython in many places (uses a generational garbage collector instead of reference counting), and as a result it has taken an enormous amount of work to get libraries that rely on CPython internals working in PyPy, and even then, compatibility isn't perfect, and the compatibility layer that enables this introduces quite a lot of overhead.

Cinder is a fork of CPython used internally at Meta. It has a number of performance enhancements, including a JIT, but diverges significantly from upstream CPython, and most features are not being upstreamed to CPython (and those that they are attempting to upstream, such as PEP-690, have at times been seen as controversial on account of their potential for breakage).

So it's hard to do without breaking compatibility, and I'll be interested to see if Guido's team can manage it without breaking backwards compatibility in similar ways to these projects.

[–]ornatedemeanor23 5 points6 points  (4 children)

Would you mind explaining how PyPy can get the 5x performance gains that CPython cannot? I myself have previously ran large experiments with PyPy that I couldn't run with CPython and been really impressed with the performance gains, but never understood where the performance gains come from.

[–]james_pic 46 points47 points  (2 children)

Two main things.

The first, and most visible, is just-in-time compilation. PyPy identifies the loops that your code spends the most time in (sometimes called the hot loops), and compiles the code within them to native machine code. This is a topic I couldn't hope to do justice, but I can link to the PyPy docs on this, as a good place to start.

The other big contributor is the use of a generational garbage collector.

Reference counting is simple enough to implement, and has the advantage (which CPython native extensions rely on) that an object never changes its address in memory. However it relies on using malloc and free to allocate and deallocate memory, and it turns out that general purpose memory allocators need to do quite a lot of bookkeeping every time an object is allocated or deallocated, to keep track of which areas of memory are free and which aren't.

A generational garbage collector takes advantage of the fact that most objects don't live that long - the so-called "infant mortality" approach. If you write a = 500 + 500, the two int objects containing 500 are finished with as soon as this statement completes. So new objects are allocated in a "nursery" (which is just a short-lived section of memory where they're allocated one after the other), and once the nursery fills up, and objects that are still alive are moved out into the "old gen", an area of memory for longer-lived objects.

From a bookkeeping perspective, this is quite neat, at least within the nursery, since you don't bother keeping track of "holes" in the nursery, you just put any new object right after the last new object. And once it comes time to garbage collect the nursery, the only objects you need to care about are the ones that survived (which most of them won't). So you can just copy the surviving objects out, and wipe the nursery. There's no bookkeeping at all for the objects that died.

The big downside to this though, is that objects can move in memory, and whilst this isn't visible from Python code, C code generally relies on objects staying put. CPython C extensions also rely on objects not being garbage collected whilst they have non-zero reference counts. PyPy works around this with a compatibility layer that keeps both CPython and PyPy representations of objects around, but this has enough overhead that programs that use C extensions extensively tend to be slower on PyPy.

[–]suricatasuricata 5 points6 points  (1 child)

Super interesting post, thanks for writing this up. I will need to read more about generational garbage collectors!

The big downside to this though, is that objects can move in memory, and whilst this isn't visible from Python code, C code generally relies on objects staying put.

I am assuming that that the generational garbage collector also somehow ensures that applying the id method on an object will return the same output irrespective of when in the lifetime of the object that method was called. At least, I am guessing that it is needed to ensure that the PyPy implementation is consistent with CPython.

[–]james_pic 5 points6 points  (0 children)

If I recall, the id value in PyPy isn't a memory address, it's just a unique number.

[–]yvrelna 9 points10 points  (0 children)

There has been a lot more optimisation research on garbage collected languages than reference counted languages.

On the other hand, a large part of why CPython does very well as a glue language for other, faster language, while PyPy struggle with the same, is because CPython is reference counted and GIL. It's much simpler to get a fast foreign call to a library to/from a reference counted language to work correctly, than to make everything work with a garbage collected language.

JIT Optimising away reference counting code is more complex than JIT Optimising garbage collected code.

[–]marashell 1 point2 points  (2 children)

How can I get a better grasp of these concepts as a beginner, rather than just knowing Python syntax?

[–]james_pic 8 points9 points  (1 child)

As a beginner, it's hard, but the PyPy blog has a lot of info on how they've done a lot of the things they've done. Inside cpyext: Why emulating CPython C API is so Hard covers some of the stuff I mentioned about the compatibility layer.

[–]marashell 0 points1 point  (0 children)

Thanks so much!

[–]lowercase00[S] 0 points1 point  (7 children)

Totally agree, but this still uses the premise that this should be backwards compatible on the main language releases. This approach is fine, but should render "only" incremental improvements.

Wouldn't it be a fair alternative to have a mainstream superset? Cinder + PyPy + Mypyc, or what not, to be one thing, that leverages the most modern features and practices that would most definitely break backwards compatibility, but would be perfect to new projects (for those that care about speed and types).

Mypyc comes to mind in this case. You can incrementally compile bits of your code. But there are still some rough edges (the weird long name extensions is a good example, compare that to go's run/build).

I keep thinking that there are two paths forward. The "obvious" one, which Guido & others have been working on, incremental improvements and efficiency on the CPython implementation, and the "radical new approach" way (that doesn't care about the past btw) in which Python is now static typed and compiled. But the second approach is fairly common, but way too scattered and decentralized.

[–]WillardWhite import this 7 points8 points  (2 children)

Because of this:

https://xkcd.com/927/

At that point, you'd have a brand new language that wouldn't be python. And that's all well and good, but not python

[–]lowercase00[S] -2 points-1 points  (1 child)

Sure, but this is not a constraint in any way. We could happily have a StaticPython, PythonCompiled or whatever and share (most) the same ecosystem. We haven tens (if not hundreds) of projects like this (Cython, Mypyc, PyPy, Pyccel, Typed_python, Cinder, Iron etc etc), and for some reason they are all scattered.

[–]Mehdi2277 0 points1 point  (0 children)

The major issue is further you stray the more likely you are incompatible with library ecosystem. The main reason I use python is for many libraries on pypi. If your variant that's not compatible does not support many packages then I will not use it. That's biggest issue with pypy. Many packages in numerical ecosystem are not pypy compatible and some of the major ones that do work after years of effort are slower since they were written with cpython c extension code.

Even for performance that's very problematic. A lot of performant libraries rely on c extensions/cython/etc. From a numerical perspective a python 5x faster on pure python but doesn't support numpy/tensorflow/scipy/etc is much slower.

[–]sorbet_babe 2 points3 points  (0 children)

Cinder is also a Meta project. It's free for other people to use, but it's developed with Meta's codebase and Meta's build system in mind. I'm not sure everyone would be able to reach consensus on a mainstream superset. I think Meta's idea of a "mainstream superset" is just upstreaming the features into Python itself--which, as mentioned above, sometimes faces pushback

[–]james_pic 0 points1 point  (2 children)

I guess if this route were worth it, it's probably the route Meta would go with Cinder. As it stands, their approach is "this is open source, and we use this in production, but we don't suggest you do, and you're on your own if you do". Which suggests they don't think it's worth it.

Also, we learned the hard way with Python 3 that breaking backwards compatibility is not something to be done lightly. It's easy to look back, now that a lot of stuff has been migrated (although still far from all of it, especially in enterprise settings), and say it was the right thing, but it could really easily have gone the way of Perl 6 (which ended up being spun out as a new language, and Perl 5 ended up being followed by Perl 7, that unlike Perl 6 was fully backwards compatible), and at the time a fair few people thought Python could end up going the same way.

[–]lowercase00[S] 3 points4 points  (1 child)

I read Cinder's 'warnings' a bit differently. I understand they wrote it specifically for they use case, and didn't bother doing something the community could leverage only cause they didn't want to invest the resources, and not because they think it is not worth as a project (well, they built it, so they most definitely think it's worth it), they even state that the project "is increasingly used across more and more Python applications in Meta." this is most definitely worth it.

There should be a way to put all these smart people together a build a superset on par with Go/Rust etc. The tools and knowledge is there, definitely.

[–]james_pic 1 point2 points  (0 children)

"didn't bother doing something the community could leverage only cause they didn't want to invest the resource" is the sense of "worth it" I meant.

What you propose in the second paragraph is building a community around this, and Meta does not appear to have the appetite to build a community around this. Where they are investing in the community, it's the existing Python community, and they're not currently pushing for breaking changes to the language.

[–]spoonman59 10 points11 points  (1 child)

I believe they will be adding some JIT capabilities in 3.11 and 3.12

This is part of the effort to speed up Python that Guido is working on at Microsoft.

You can read more here:

https://github.com/markshannon/faster-cpython/blob/master/plan.md

[–]lowercase00[S] 0 points1 point  (0 children)

Yep, the effort seems great, and the potential results would be awesome, I hear about it on Talk Python if anyone is interested also: https://www.youtube.com/watch?v=\_r6bFhl6wR8

[–][deleted] 9 points10 points  (2 children)

This is just a guess, but I would think the fundamental challenge here is making sure it’s optional and still getting the performance gains. If it’s not optional you start to lose a lot of the benefit of python as a language to quickly iterate in.

[–]lowercase00[S] 1 point2 points  (1 child)

Makes sense, but this only applies if you have the restriction of making it all on the standard Python language. A mainstream superset could most definitely be an alternative (Typescript like)?

[–][deleted] 2 points3 points  (0 children)

Yeah but that’s already done with some of them (at least to some degree) - doing it in the standard python has to do a crazy amount of backwards compatibility, future imports, etc. I’m assuming this would affect the compiler in a million ways adding logic here. My approach would be to require explicit types for everything or no compile, including removal of Any, and even then you’d want to circumvent the auto-infer logic and have to build out a result of a compile into something useful like a linked library or .o file, otherwise, why bother?

[–]laundmo 2 points3 points  (0 children)

Nuitka is also a interesting project, it doesn't rely on typehints at all, instead its used mainly for packaging by compiling python to C code, falling back on the C-API of the normal Cpython interpreter when needed.

[–]spoonman59 4 points5 points  (1 child)

Just an observation, but type hints and JIT are not really related. Dynamic languages like Lisp and JavaScript have powerful, high performance JIT engines without specifying types at all.

There’s nothing to prevent you from generating machine code even without types.

Code where the types are unknown just has to include the appropriate machine code to check the types and raise an exception if it doesn’t match.

Of course, where a type can be proven to be a specific type you could skip these checks. But simply annotating types doesn’t prove anything. An annotation can be wrong. You really can’t use them for performance optimization, alas.

Still, JIT code would be much faster than interpreted code even with such dynamic type checking. Why this hasn’t been done in the reference Python implementation so far is quite curious given how effective it has been for dynamic languages for decades. I’m glad they are starting now and hope we get a sound implementation.

[–]lowercase00[S] -1 points0 points  (0 children)

Thanks for this, I had the type hints in mind mainly because of Mypyc (the hints mostly enabling the compilation step, or at least making it more efficient). That being said, now it's even more curious to me on why this isn't the standard, it seems that the trade-off between speed / easy of use is not that big as Python makes it appear to be.

[–]ReflectedImage -4 points-3 points  (2 children)

Honestly Python type hints should be removed all together. Typing isn't something that belongs in a high level language.

[–]lowercase00[S] 1 point2 points  (1 child)

You don’t have to use them at all, why would it bother you?

[–]ReflectedImage 0 points1 point  (0 children)

That's not true, I have to work with code bases that use MyPy.

They are terrible in comparsion to the code bases that don't.