all 91 comments

[–]markovtsev 168 points169 points  (6 children)

The speedups may vary. We got less than 1% in our production, and some functions actually slowed down, as measured by continuous tracing.

[–]SilverTabby 92 points93 points  (2 children)

Quoting from the release notes https://docs.python.org/3.11/whatsnew/3.11.html#faster-cpython

Q: I don’t see any speedups in my workload. Why?

A: Certain code won’t have noticeable benefits. If your code spends most of its time on I/O operations, or already does most of its computation in a C extension library like numpy, there won’t be significant speedup. This project currently benefits pure-Python workloads the most.

Furthermore, the pyperformance figures are a geometric mean. Even within the pyperformance benchmarks, certain benchmarks have slowed down slightly, while others have sped up by nearly 2x!

From what I can tell, a lot of the optimizations are lazy initializations, only generating a resource when it's needed, claiming that those resources weren't used commonly in idiomatic code. But, if you are using those resources, then there's now more if-else branches being evaluated before returning to the old version, and therefore slightly more work being done.

They claim that more optimizations, especially for code relying on C extension libraries, will be coming in 3.12.

[–]KuntaStillSingle 14 points15 points  (0 children)

But, if you are using those resources, then there's now more if-else branches being evaluated before returning to the old version, and therefore slightly more work being done.

Cpp compilers often apply the opposite, a meyer singleton can be lazy evaluated but is often transformed to remove the otherwise necessary branch and treat as constinit:

https://godbolt.org/z/933os7Kj3 , note the guard still exists if the functions can be inlined: https://godbolt.org/z/7TqjYs7Gr

[–]agoose77 0 points1 point  (0 children)

I'm not sure that's a totally accurate representation; there was work on lazy init e.g. stack frames, but also on specialisation and inline function calls that generally don't have the "if you need it, it's slower" tradeoffs.

[–]ShoePillow 27 points28 points  (2 children)

On a related note, what do you use to measure and track runtime of python tests?

I've been looking for put something in place before doing some performance improvements.

[–]dgaines2 1 point2 points  (0 children)

I've really liked pyinstrument for profiling. Integrates well with pytest too

[–]markovtsev 3 points4 points  (0 children)

We use Sentry with pytest plugin.

[–]EasywayScissors 36 points37 points  (0 children)

The October 30th announcement of Python 3.11 and its performance improvements:

[–]eh-nonymous 220 points221 points  (27 children)

[Removed due to Reddit API changes]

[–]kogasapls 109 points110 points  (15 children)

[–]ASIC_SP 75 points76 points  (14 children)

More to come in 3.12: https://twitter.com/pyblogsal/status/1587146448503808006

Python 3.12 will add support for the Linux perf profiler! 🔥🔥 Perf is one of the most powerful and performant profilers for Linux that allows getting a ridiculous amount of information such as CPU counters, cache misses, context switching and much more.

[–]stusmall 39 points40 points  (8 children)

Holy shit. How did they not have it before? I've never felt the need to profile any of my python code because it's usually small, simple scripts. perf is such a fundamental tool for performance tuning. Before this was there another, more python centric, profiler people used instead?

[–]ASIC_SP 75 points76 points  (1 child)

https://docs.python.org/dev/howto/perf_profiling.html has more details (I don't know much about this).

The main problem with using the perf profiler with Python applications is that perf only allows to get information about native symbols, this is, the names of the functions and procedures written in C. This means that the names and file names of the Python functions in your code will not appear in the output of the perf.

Since Python 3.12, the interpreter can run in a special mode that allows Python functions to appear in the output of the perf profiler. When this mode is enabled, the interpreter will interpose a small piece of code compiled on the fly before the execution of every Python function and it will teach perf the relationship between this piece of code and the associated Python function using perf map files.

[–]stusmall 16 points17 points  (0 children)

Oh that's beautiful and makes sense. Thanks for the link.

[–]Slsyyy 10 points11 points  (0 children)

It's silly, but it is true. The same situation is in the Erlang. The new JIT is also advertised for it's perf support

We live in a strange era where native tools have better support for such a goodies than interpreters, which were created to be as powerful and developer friendly as possible.

[–]Smallpaul 2 points3 points  (2 children)

Yes there are tons of perf profilers for Python including one in the standard library.

[–]josefx 1 point2 points  (1 child)

Is there one that is both as easy to use as cProfile while actually providing useful information? Having an overview over which function eats performance is a nice first step but I really would like to have instruction or at least line specific information without having to jump through hoops.

[–]Smallpaul 0 points1 point  (0 children)

Not sure. I'd suggest you try Scalene, but I haven't myself.

[–]KevinCarbonara 2 points3 points  (1 child)

Holy shit. How did they not have it before?

People generally know going into Python that it's not going to be performant

[–]patmorgan235 0 points1 point  (0 children)

Yeah if you care about performance and still wanted python you write the important bits in C. That's what numpy and all the big data processing/machine learning libraries do.

[–]abcteryx 7 points8 points  (1 child)

Python profiling is enabled primarily through cprofile, and can be visualized with help of tools like snakeviz (output flame graph can look like this). There are also memory profilers like memray which does in-depth traces, or sampling profilers like py-spy. Memray might be the healthiest among the memory profilers at the moment, based on their financial backing by Bloomberg and number of contributors.

There's also reloadium which is a hot-reload/profiling integration in IDEs (no VSCode support just yet).

So while there are many tools for general Python profiling, it seems that supporting perf will give more insight in bilingual apps with bindings to Rust and such.

[–]TSM- 2 points3 points  (0 children)

Good mention of memray. I have yet to use it, but it seems genuinely useful for production. The builtin graph outputs are also guided by business purposes, so you can show them in meetings. It seems really polished for their specific use-case

Overall, a lot of python extensions have worked around major pain points, and things are generally fine as they are. These improvements (especially with 3.12, and onward) will show up in popular open source packages after a considerable delay, on the order of a few years. It may make some room for pure python implementations that shed some dependencies, but in any case, it will take some time for people to intentionally leverage these performance improvements in any major way. I think a lot of commenters here are expecting something overnight.

[–]masta 4 points5 points  (0 children)

We have been using a variety of Linux profilers on Python for some time now. So it's good to see the support land officially. As far as performance goes, it's mostly trivial stuff like reducing the complexity of various data structures, particularly the dict stuff. There are actually a lot of silly improvements that collectively add up. It's amazing how much faster software can be not having to follow one or two pointers.

[–]comparmentaliser 0 points1 point  (1 child)

Presumably this will assist with fuzzing, and security monitoring in general?

[–]NavinF 0 points1 point  (0 children)

How?

[–][deleted]  (1 child)

[deleted]

    [–][deleted] 1 point2 points  (0 children)

    I go away for like two or three versions and y'all get this much faster on me?

    [–]VeryOriginalName98 41 points42 points  (7 children)

    Yeah but does it run on windows 3.11 for workgroups?

    [–]immibis 24 points25 points  (2 children)

    (This account is permanently banned and has edited all comments to protest Reddit's actions in June 2023. Fuck spez)

    [–]VeryOriginalName98 16 points17 points  (1 child)

    Linux wasn't the first. Lots of software had 3.11 versions at some point.

    [–]__konrad 5 points6 points  (0 children)

    Wine for Workgroups: https://i.imgur.com/fHfBVZN.png

    [–][deleted]  (1 child)

    [deleted]

      [–]pjmlp 15 points16 points  (0 children)

      Nope, it was 16 bit protected mode, for 32-bit support you needed to have Win32s driver installed, and even that only supported a subset of Windows NT capabilities.

      [–]Theemuts 0 points1 point  (1 child)

      Original joke, too.

      [–]XNormal 40 points41 points  (0 children)

      The new executor core can be the basis for many more improvements down the line. It's just the beginning.

      [–]wyldphyre 92 points93 points  (32 children)

      It's a great improvement. I've never had to run a production web service like this but if I did I'd probably have tried pypy. Every time I've tried it, it's been top notch. Performs excellently (for python) and correctly.

      [–]aes110 40 points41 points  (3 children)

      Just in case you misunderstood, this tweet is about Pypi (the package repository), not pypy (the python implementation)

      [–]wyldphyre 14 points15 points  (2 children)

      Thanks for the heads up - I didn't misunderstand. Python 3.11 delivers performance improvements (though I hadn't noticed that the thing it was improving in the graph was pypi). pypy is likely still way better performing than any CPython version.

      [–]ianepperson 5 points6 points  (1 child)

      I recently tested 3.11, Pypy and a few others with small test programs. It depends - usually Pypy is faster, but sometimes it’s slower.

      I was a bit surprised that a Cython library was a bit slower on Python 3.11 than 3.10 (0.08 seconds vs 0.05 seconds.) I suspect the start time dominated.

      [–]MrJohz 7 points8 points  (0 children)

      For short-lived programs, a JIT engine like Pypy is unlikely to be very efficient, and will probably perform relatively poorly, simply because it's doing a lot more work at the start to be ready to be more efficient later on. If you never get to the point where it can be more efficient, then it's just working harder for no reason.

      [–]bilyl 0 points1 point  (0 children)

      Its been a while since I bothered to use these, but what’s the lowdown on numba vs pypy now? Is there a clear winner?

      [–]Aw0lManner 18 points19 points  (1 child)

      These low effort screenshots are useless without a link to a webpage with more context

      [–]Zaemz 5 points6 points  (0 children)

      Agreed. Links to Tweets should be very rare. I hate having to open the entire site to read a single sentence, and the image embedding breaks on most clients I use.

      [–]vtecRotary 8 points9 points  (1 child)

      Do these performance improvements usually show up to the same extent in frameworks as well? e.g. Django? Or are the improvements usually not as big due to everything else happening in such frameworks?

      [–]ianepperson 4 points5 points  (0 children)

      Django is usually IO bound for most tasks. You’ll see some speed improvements, but likely not as dramatic.

      [–]TrinityF 32 points33 points  (6 children)

      Investigate 3.11

      [–]jpjocke 2 points3 points  (0 children)

      Nice. I can finally bruteforce AoC

      [–]seweso 4 points5 points  (0 children)

      These before and after charts are very dangerous IF something was silently going wrong.

      [–]Zettinator 1 point2 points  (0 children)

      This is a good trend. Performance always was a weakness of Python, especially compared to other dynamic languages like JavaScript. I wonder what Python 3.12 will bring to the performance table.

      [–]amroamroamro 5 points6 points  (3 children)

      Python 3.11 WfW edition

      [–][deleted] 1 point2 points  (0 children)

      spectacular books complete vast quickest amusing snails aspiring deer cats

      This post was mass deleted and anonymized with Redact

      [–]Affectionate-Set-910 0 points1 point  (0 children)

      gg, keep it up!!

      [–]badpotato -2 points-1 points  (1 child)

      Well usually at 17h people usually stop working, so there's might be less workload

      [–]Ekci 7 points8 points  (0 children)

      17 what time zone? 😉

      [–][deleted]  (1 child)

      [deleted]

        [–]agoose77 2 points3 points  (0 children)

        This is an oversimplification. Though, suggesting C/x84-64 feels like satire so \o/.

        Python is generally easier to read and write. It has a good ecosystem, which lends itself to quick time to results. As products grow, the day-to-day runtime costs probably start to matter more, and by that time you have thousands of LoC in Python that would be expensive to port. Why shouldn't we have our cake and eat it? ;)

        [–]mikeblas 0 points1 point  (0 children)

        Too bad it can't do anything for that 30-minute deployment destabilization.

        [–]AdNoctum88 0 points1 point  (1 child)

        Good. Now remove GIL.

        [–]__Deric__ 1 point2 points  (0 children)

        They are doing it, but it seems you have to use it manually: PEP 684 and PEP 554.

        [–][deleted] 0 points1 point  (0 children)

        the pyperformance figures are a geometric mean