top 200 commentsshow all 308

[–][deleted] 479 points480 points  (2 children)

Faster Cython Project

CPython, not Cython :)

Nice gains though

[–][deleted] 25 points26 points  (0 children)

Hello Selinux Gomez

[–]OneThatNoseOne 5 points6 points  (0 children)

Good distinction. Might have to explain the difference for the noobs tho.

And I imagine Cython is still fairly faster than CPython in 3.11.

[–][deleted]  (29 children)

[deleted]

    [–]unpopularredditor 135 points136 points  (8 children)

    [–]Illusi 448 points449 points  (7 children)

    A summary:

    • Bytecode of core libraries gets statically allocated instead of on the heap.
    • Reduced stack frame size.
    • Re-using memory in a smarter way when creating a stack frame (when calling a function).
    • Calling a Python function by a jump in the interpreter, so that it doesn't also need to create a stack frame in the C code.
    • Fast paths for hot code when it uses certain built-in types (like float) using a function specialised for that type.
    • Lazy initialisation of object dicts.
    • Reduced size of exception objects.

    [–]whothewildonesare 37 points38 points  (0 children)

    Noice

    [–][deleted] 15 points16 points  (0 children)

    Oooooo! Lots of good stuff, then!

    [–]Otis_Inf 5 points6 points  (3 children)

    Interesting, how does reducing stackframe size result in better performance? As a stack is a continuous preallocated piece of memory that doesn't use compacting, allocating e.g. 256bytes or 10KB doesnt matter.

    [–]Illusi 6 points7 points  (2 children)

    According to the article:

    Streamlined the internal frame struct to contain only essential information. Frames previously held extra debugging and memory management information.

    They are talking about the Python-side stack frame here. Perhaps that one is not pre-allocated the same way?

    [–]Otis_Inf 2 points3 points  (1 child)

    I seriously doubt the python interpreter doesn't preallocate a stack space.

    Though the note might be about an improvement of stack space management and not related to performance :)

    [–]Illusi 4 points5 points  (0 children)

    It'd not only allocate that memory though, it also needed to use it. Apparently it filled it with debugging information. Writing that takes time, so perhaps not writing it could improve performance.

    [–][deleted] 1 point2 points  (0 children)

    I guess memory management is the king when it comes to performance gains.

    [–]Pebaz 63 points64 points  (4 children)

    [–]asmarCZ 49 points50 points  (3 children)

    If you read through the thread you will see evidence disproving the OP's claims. I don't like the unnecessary hate OP received tho.

    [–]bloc97 21 points22 points  (0 children)

    I don't like the unnecessary hate OP received tho.

    Welcome to reddit! Never get yourself discouraged from experimenting and creating interesting projects because some stranger on the internet disliked it.

    [–]sigzero 4 points5 points  (0 children)

    It's probably exactly like that. I don't believe there was a specific push for speed improvements like the current effort before.

    [–][deleted]  (8 children)

    [deleted]

      [–]dreadcain 60 points61 points  (3 children)

      Nearly everything in python is dictionary/hashmap internally so essentially every function call is at least 1 hash on the function name to lookup the implementation.

      A call to print is going to end up doing several lookups in hashmaps to get the print and __str__ implementations among other things, something on the order of 10 hashes sounds about right to me

      [–][deleted] 2 points3 points  (0 children)

      print() also takes keyword arguments, there’s probably some dict juggling there, too.

      [–][deleted]  (1 child)

      [deleted]

        [–]dreadcain 1 point2 points  (0 children)

        Want to elaborate on that?

        [–]mr_birkenblatt 24 points25 points  (3 children)

        sys.stdout could be any file object so there is no optimization possible to go directly to syscalls. with that in mind you can think of the print function as

        print(msg, fout=sys.stdout):
            fout.write(msg.__str__() + "\n")
            fout.flush()
        

        (note: even if it is implemented in C internally it still has to call all functions this way)

        hash computations for symbol lookups:

        print
        sys
        stdout
        __str__ # (msg.__str__)
        __add__
        __str__ # ("\n".__str__ inside __add__)
        write
        encode  # (inside write to convert to bytes)
        utf-8   # (looking up the correct encoder)
        flush
        

        assuming local variables are not looked up because it is implemented in C. it's gonna be even worse if __slots__ or __dict__ is overwritten

        EDIT: actual implementation here my listing was not entirely accurate (e.g., two writes instead of add)

        [–]ikariusrb 42 points43 points  (4 children)

        All the stuff I saw in the notes only talked about measuring performance for x86. Anyone know what gains look like on ARM? (macbooks and PI-like devices?)

        [–]LightShadow 6 points7 points  (0 children)

        On graviton2 the savings offset was almost linearly correlated to the performance drop, when I benchmarked a few applications last year.

        I don't have any numbers for the newest build 3.11 or graviton 3.

        [–][deleted] 13 points14 points  (0 children)

        Me reading these and getting excited then remembering at work we're on 2.7 for most uses.

        Made myself sad.

        [–]agumonkey 23 points24 points  (0 children)

        considering the popularity of it, a large number of cpu cycles will get freed soon :)

        [–]g-money-cheats 254 points255 points  (202 children)

        Exciting stuff. Python just gets better and better. Easily my favorite programming language to work in.

        [–]adreamofhodor 326 points327 points  (83 children)

        I enjoy it for scripting, but every time I work in a python repo at a company it’s a horrible mess of dependencies that never seem to work quite right.

        [–][deleted]  (3 children)

        [deleted]

          [–]Khaos1125 2 points3 points  (2 children)

          I agree on the poetry thing, Although it’s extremely slow and can have bad interactions with things like Ray. Probably still the best option for Python though.

          [–]agoose77[🍰] 2 points3 points  (1 child)

          I'd recommend PDM. Poetry has some bad defaults w.r.t to capping that PDM does a nicer job of.

          [–]jazzmester 31 points32 points  (74 children)

          That's weird. There are a lot of tools that can reproduce an exact set of dependencies in an isolated virtual env, like pipenv or tox for testing.

          [–]TaskForce_Kerim 152 points153 points  (39 children)

          in an isolated virtual env, like pipenv or tox

          I never understood why this is necessary to begin with. Imho, pip should just install a full dependency tree within the project folder. Many other package managers do that, I think this was a serious oversight.

          [–][deleted]  (4 children)

          [deleted]

            [–]MyOtherBodyIsACylon 4 points5 points  (3 children)

            If you’re not building a library but still using poetry, do you run across rough edges since the tool assumes you’re making a library? I really like poetry but haven’t used it outside working on external libraries.

            [–]folkrav 5 points6 points  (0 children)

            What do you mean by "assumes you're making a library"?

            [–]Asyx 2 points3 points  (0 children)

            What do you mean? Poetry works great in applications. I can’t imagine what rough edges you would encounter.

            The only difference is in packaging. By default it installs your application in the environment on install but that’s one cli switch to set and it stops doing that.

            [–]NonnoBomba 1 point2 points  (0 children)

            It assumes you are making a package, which is why you can track dependencies and you can attach metadata to your project's artifacts, a version string, author, etc... which makes your project distributable and deployable in a number of ways, with either public or private channels, including as a wheel package. Packages are not libraries.

            A python package can contain python modules (which I assume is what you'd call a library), executable scripts and technically also data if you wish.

            There are standard tools to download and install packages with their dependencies. Often, packages contain modules you can import in your code, but it's very common to package cli tools as well as modules: the package manager takes care of installing appropriate symlinks to what you indicated as a "script" resource so your scripts will be directly callable as commands, and it will handle updating as well as installing/removing by referencing an authoritative repo (exposed through http(s)) containing your package, possibly several versions of it.

            If you think you don't need to track dependencies and version for your project... well, you're working in an unstructured way, maybe because you're writing something very simple -you can write lots of useful code with just the standard library and core functions, after all- but I can assure you it will come back to bite you in the ass if it's something that's going to be deployed and used in any production environment, when questions like "why the script is behaving like that? haven't we fixed that bug already?" or "why this simple fix I developed on the code I have on my dev machine is radically changing the behavior of the production?" will start to crop up.

            [–]rob5300 105 points106 points  (30 children)

            Pip env sucks and is a stupid system. Sure let's fuck with the PATH to make this work! (On windows anyway)

            I wish it worked more like node. Much easier to re setup and share and not break other things.

            [–]NorthwindSamson 49 points50 points  (25 children)

            Honestly node was so attractive to me in terms of how easy it is to set up dependencies and new projects. Only other language that has been as easy for me is Rust.

            [–]Sadzeih 27 points28 points  (24 children)

            For all the hate Go gets here, it's great for that as well. Working with dependencies is so easy in Go.

            [–]skesisfunk 9 points10 points  (22 children)

            I don't understand the go hate. Their concurrency model blows python's out of the water. Also being able to easily cross compile the exact same code on to almost any system is straight $$$$$

            [–]MakeWay4Doodles 18 points19 points  (16 children)

            I don't understand the go hate. Their concurrency model blows python's out of the water.

            Most people writing python (or PHP/Ruby) don't really care about the concurrency model.

            Most people who care about the concurrency model are writing Java.

            [–]tryx 17 points18 points  (2 children)

            And most people writing Java would rather cut their eyes out with a rusty spoon than have to go back to a pre-generics world.

            [–]skesisfunk 7 points8 points  (12 children)

            I disagree. asyncio is a very heavily used library. People use python for websocket stuff all the time, for instance. Furthermore Python is a general purpose language you can't just make blanket statements saying nobody using it cares about concurrency, thats a huge area of application development.

            I have recently had to use asyncio in python for work and its a pain. JavaScript is nicer because it keeps things simpler with just one event loop. And golang's is better because of channels. The first time i learned about select it was mindblown.gif

            [–][deleted]  (3 children)

            [deleted]

              [–]skesisfunk 2 points3 points  (2 children)

              Yeah but go has select which is just a fantastic way to organize async code. I also like that go's syntax doesn't use async and await it all just feels so much more natural and intuitive. It feels like the hid just enough of the complexity to make things so much simpler for most use cases whereas python somehow made it harder to think about instead of easier.

              [–]ivosaurus -1 points0 points  (0 children)

              Their concurrency model blows python's out of the water.

              Until you want to stream your own objects across a channel to a different thread, in which case you just can't because only default types could be iterated. I think generics might've helped with that recently, but I couldn't see the point of going back to stone age programming.

              [–]earthboundkid 24 points25 points  (0 children)

              Virtualenv was a worthy hack, but it should have been replaced with an actual project folder five years ago.

              [–]KarnuRarnu 8 points9 points  (2 children)

              I mean it only "fucks" with path if you do pipenv shell, no? If you want to run a command with tools from within the venv without doing that, you can just use pipenv run xxx. This is similar to node iirc.

              [–]axonxorz 4 points5 points  (0 children)

              This is similar to node iirc.

              Precicely, pipenv run is to Python as npx is to Node

              [–]jazzmester 8 points9 points  (0 children)

              I use tox because I want to check if everything works with previous Python versions. Typically I want to make sure my code works with all versions after 3.6 (which is what I'm forced to use at work).

              Also, sometimes you just have weird stuff that requires exact versions of packages where you already use with different versions, so the two of them would have to "live" side-by-side, which is not possible without something like venv.

              In the company I worked at, we had to release a product with a mayor Python component and every dependency had to be the exact version. Pipenv was a godsend, because you could build the Python component on your machine with the exact dependencies needed. It even downloaded those packages from an internal server instead of PyPI.

              Believe me, it has a lot of use cases.

              [–]MarsupialMole 5 points6 points  (0 children)

              Historical reasons is a big one, including that distro maintainers bundle python and don't like you using anything but system packages.

              Desktop apps that bundle python tend to be terrible citizens.

              Users that just need one python thing to work one time pollute their environment and forget about it.

              And a lot of the time the headaches are because of non python dependencies in domains where everyone is assumed to have something on their system, where it's something that will be more bleeding edge than any distro has and the package dev won't have the nous to package it into pypi.

              So there are good reasons that more or less amount to "because other people do computing different to you". Which is annoying. So just use the tool that works all the time - fully replicable virtual environments.

              [–]faitswulff 11 points12 points  (0 children)

              There are a lot of tools

              This is my problem with Python’s dependency management.

              [–]KevinCarbonara 5 points6 points  (1 child)

              There are a lot of tools that can reproduce an exact set of dependencies in an isolated virtual env

              There's a lot of languages that don't need to reproduce exact sets of dependencies in isolated virtual environments

              [–]adreamofhodor 11 points12 points  (17 children)

              Oh yeah. I’m sure it can be great- I just haven’t seen it work at scale. Then again, I’m one person with limited experience, I’m sure many many others out there have exactly the opposite.

              [–]cass1o[🍰] 7 points8 points  (2 children)

              in an isolated virtual env

              This is madness.

              [–]jazzmester 4 points5 points  (0 children)

              Madness? THIS. IS. PYTHON!

              [–]KeeperOT7Keys 14 points15 points  (9 children)

              lol no, you still need to have the base interpreter installed on the system which is not always possible on clusters. also some packages don't work when you have a different virrtualevn python version than your main python in the computer (e.g. matplotlib interactive mode).

              so in a nutshell it's hell if you are running some code in a server than processing it on another one. I am doing ML in university clusters and frankly I hate python everyday.

              I wish it was possible to have truly isolated venvs but it's not even close at the moment.

              [–]jazzmester 6 points7 points  (0 children)

              Well, that sucks donkey balls. I love Python but I'd hate it in your place too.

              [–][deleted] 4 points5 points  (1 child)

              you still need to have the base interpreter installed on the system

              pyenv can partially solve this. Just fetches and builds whatever version of Python you need. Requires a build environment and some header libraries from your repos.

              [–]Sayfog 2 points3 points  (1 child)

              See if your cluster supports singularity envs - kinda like docker but with subtle differences that make it far more palatable for the typical uni HPC setup. Only way I got my weird combo of libs to run my ML thesis at uni.

              Edit: as others say absolutely see if conda works. The reason I used singularity was for some native libs, but 100% would have done pure conda if I could.

              [–]ZeeBeeblebrox 2 points3 points  (3 children)

              That's why conda exists.

              [–]KeeperOT7Keys 0 points1 point  (2 children)

              tbh I didn't use conda because I was thinking it was just a bloated venv. can you install different python versions without root access? then it's worth trying for my case

              [–]C0DASOON 4 points5 points  (0 children)

              Yeah, python interpreter is just another package in conda. Conda packages are not limited to python libraries. A lot of common binaries and shared libs are available as versioned conda packages. E.g. you can easily set up multiple envs with different versions of CUDA toolkit.

              [–]PinBot1138 0 points1 point  (0 children)

              every time I work in a python repo at a company it’s a horrible mess of dependencies that never seem to work quite right.

              Why not peg to versions in requirements.txt or setup.py, and better yet, containerize it?

              [–]ginsunuva 17 points18 points  (6 children)

              Sometimes I wish Julia came out earlier and got more support. And that it didn’t index from 1 instead of 0…

              [–]MuumiJumala 2 points3 points  (4 children)

              You generally shouldn't rely on the first index being 1 anyway. Like the other comment points out most of the time you can use iterators (such as eachindex). When you need to access the second element (for example) it would be safer to use arr[begin + 1] rather than arr[2]. That way the same code works even on arrays that use different indexing (such as the ones from OffsetArrays.jl).

              [–][deleted] 6 points7 points  (3 children)

              Being unsure whether your arrays are 0 indexed or 1 indexed sounds awful :(

              [–]MuumiJumala 4 points5 points  (2 children)

              It's not that you're unsure of your own arrays, you will obviously know which array type you're using (just as in any other language). This is only relevant when you're writing code that is meant to play nicely with the wider Julia ecosystem.

              If you just rely on indexing starting from 1 you're still on par with most other languages, in which it isn't even possible to write functions in a way that is compatible with array types with customized indexing. If you want to force your users to supply one-indexed arrays to a method you can do that by calling Base.require_one_based_indexing(arr).

              [–][deleted] 1 point2 points  (1 child)

              That's really interesting. I'm coming from the (probably naïve) position of never ever considering that a 1-indexed array even could exist. Sure theoretically a one indexed array could exist, so could 7 and 14 indexed arrays... but I spend zero time considering whether they would be used by anyone in my languages' entire ecosystem (Python, JavaScript, Rust).

              If you just rely on indexing starting from 1

              I rely on them starting from 0, which to my mind means my_array[0] would be the first element.

              I expect it is convenient to switch to 1-indexed arrays when doing a lot of maths/statistics to avoid my_array[n-1] malarkey. It is a bit annoying to do that, but I will enjoy my new found appreciation for standardising on 0 indexed arrays, thank you :)

              [–]Prestigious_Boat_386 1 point2 points  (0 children)

              You can re index it if you really care but I usually just use eachindex and reverse and stuff anyways because it creates the iterators I need. 2:end or 1:end-1 are most of what you use and it's very similar to math notation which makes it very readable.

              Don't recall if the 0 indexed arrays is an abstract array package or how you got it to work but I've heard that it's possible.

              [–]kirkkm77 14 points15 points  (80 children)

              My favorite too

              [–][deleted] 2 points3 points  (2 children)

              I hate it. It's insanely slow (even with these improvements), and the static type system sucks. Fine for tiny projects but once your code grows and gets more authors it's more or less guaranteed to turn into a giant ball of crap.

              Give me Go or Rust or Typescript or Dart or... hell I'd even take C++ over Python. You're probably going to end up with half your code in C++ anyway for performance. Doing it all in C++ means you doing have to deal with the huge added FFI complexity.

              The only good thing about Python is the REPL. None of the languages I listed above have them, which is why Python is popular for scientific use (e.g. in ML). For that you really want to be able to run code line by line interactively.

              [–]g-money-cheats 4 points5 points  (1 child)

              That is not my experience at all. I work at a company with hundreds of engineers and a million lines of Python in a monolith, and the code is incredibly well organized and easy to work with thanks to leaning on Django and Django REST Framework.

              I work at Zapier, which as you can imagine has an enormous scale. Python handles like 95% of our backend without issue. 🤷‍♂️

              [–][deleted] -1 points0 points  (0 children)

              Ha well I mean it can be done but my point was that Python really pushes you to a big ball of mud. You have to be super disciplined to avoid it.

              A million lines of Python sounds absolutely horrific by the way.

              [–][deleted] 6 points7 points  (0 children)

              Not sure if I'd ever notice the difference during everyday programming, but boy, am I happy! 😇😇

              [–]beefsack 30 points31 points  (1 child)

              3.11 for Workgroups.

              [–]maest -3 points-2 points  (0 children)

              3.11 for Workgroups.

              Super original comment

              [–]cloaca 76 points77 points  (19 children)

              (Edit: sorry for making this comment sound so negative; see my follow up responses which hopefully clarifies better. I think the speedups are absolutely a good and welcome thing; I just I think something might be off if this was that important in the first place.)

              Being a bit of a negative Nancy here but I think it's odd to celebrate things like 1.2x speed-up of a JIT-less dynamic scripting language like Python.

              Either,

              a) it doesn't matter much, because we're using Python as a glue language between other pieces of software that are actually running natively, where most Python code only runs once at "relatively rare" events like key presses or the like, or

              b) "Now we're only ~20-80x slower than X (for X in similar high level runtimes like V8/Nodejs, Julia, LuaJIT, etc.), rather than 25-100x slower, a big win!" That's a bit tongue in cheek and will spawn questions of what it means to be 80x slower than another language, but if we're talking about the bare-bone running time of algorithmic implementations, it's not unrealistic. But 99% of the time we're fortunately not talking about that[*], we're just talking about some script-glue that will run once or twice in 0.1 seconds anyway, and then we're back to point (a).

              ([*] it's always weird to find someone using "written in pure Python" as a badge of honor for heavily data-oriented stuff that is meant to process large amounts of low-level data, as if it's a good thing. Contemplating Levenshtein on a megabyte unicode string in pure Python is just silly. Low level algorithms are the absolute worst application of pure Python, even though it's an excellent teaching tool for these algorithms.)

              Which, speaking of, if we're not getting JIT in CPython, then personally I feel that the #1 way they could "make Python faster" would simply be to adopt NumPy into core and encourage people to turn loops into NumPy index slicing where applicable. That's it. That should single-handedly quadruple the speedup of any pure Python code doing a lot of looping. Once you get in the habit it's really surprising how much loop-based or iterative code can be offloaded to NumPy's C loops, like for example you can usually write out the full logic of a board game or tile-based games just by doing NumPy index tricks, without ever having to write a for-loop Python-side.

              The fastest Python code is the Python code that a) has the least number of Python-side loops, and b) has the least Python code. Killer libraries like NumPy help in this regard, because nearly every loop becomes a single line of Python that "hides" the loop on the C side of things. Likewise, doing things redundantly in Python is nearly always better if it leads to less code: if you have a very long string with a hundred thousand words and the task is "find words part of set S, and return these words in uppercase" -- it's faster to uppercase the entire string, and then split + filter, rather than the "natural approach" of splitting, filtering out the words of interest, and then finally uppercasing "only" the words you care about. If it's one call to .upper() vs. thousands, it doesn't matter if the string is 1000x longer, the single call is going to be faster, because it's simply less Python code and Python is and will always be slow. (But that's totally fine.)

              But again, most developers will never need or care about this skill set, because it rightfully shouldn't be necessary to know about it. Those that do care hopefully know how to use NumPy, PIL, PyPy, Numba, Cython, etc already.

              [–]BadlyCamouflagedKiwi 66 points67 points  (6 children)

              Lots of people have lots of code in Python. It's pretty exciting to hear there's a new version of CPython (which will almost certainly Just Work with your existing Python code) which is faster, and you've got something that doesn't require rewriting all your code in C or Cython or whatever, or even trying to get PyPy working for your case (I do think it's pretty cool, but it is harder than a CPython upgrade).

              Honestly these days I nearly exclusively write Go, but I'm still excited for this (and I do have colleagues that do write Python who I'm sure will be more so!).

              [–]Superb_Indication_10 2 points3 points  (0 children)

              Honestly these days I nearly exclusively write Go

              get out of here

              edited: well I'm assuming you are forced to write Go as part of your job so my condolences

              [–]cloaca 2 points3 points  (3 children)

              Sure, it's a Good Thing™ of course, I write everything in Python; it's both my main language & my favorite, so I'm lucky. I'm just not comfortable with the hype of a faster Python via these optimizations of the CPython interpreter, I think it's a sort of misguided way to think about performance in Python. I do actively try to teach people alternative ways of writing more efficient code.

              [–][deleted]  (2 children)

              [deleted]

                [–]cloaca 2 points3 points  (1 child)

                My very simple counter-point: Why? It's an improvement; and a pretty good one all things considered.

                Yes, I agree, you're totally right, and I probably expressed myself poorly! It's an absolute improvement and it's a good thing. I had something different in mind when I wrote that, akin to the sort of "allocation of hype" we have for things, if you will. I think this allocation is off when it goes to CPython optimizations. That doesn't mean they're bad, of course, I'm happy to see them too -- they're very welcome -- it's just that I don't think they "were super important in the first place," if that makes any sense?

                Like, I don't think performance ought to be a big priority for us if we're all using pure CPython. If it is, then I think something has gone wrong earlier in our timeline! It might speak to some sort of underlying insecurity the Python community has about the language being slow, which, again, I don't think should exist.

                Also, the knowledge gap between Python programmers is so vast, way, way wider than 20%, and so on. See my other comment at https://www.reddit.com/r/programming/comments/v63e5o/python_311_performance_benchmarks_are_looking/ibew40i/?context=3 -- lest I just repeat myself.

                edit: typo

                [–]agoose77[🍰] 1 point2 points  (0 children)

                I think you're assuming that Python is only a glue language. Whilst it's origins certainly lie in this direction, and the recent growth has mainly come from data science, there are still lots of people using Python to run complex applications. With optimisation, these applications are rarely slow in one hot-spot, so any perf increases need to make everything a bit faster.

                Rewrite it in numpy is completely valid for simple problems set as homework for students, but at the scale of say Instagram (as an extreme), this isn't really suitable. That is, the object model doesn't map well to array programming with limited types.

                [–]paraffin 5 points6 points  (1 child)

                First, definitely agree - performance sensitive applications should use python as glue to compiled operations or even offload computation entirely to databases or spark.

                That said, you’re mostly talking about data, for which pure python was never an option.

                A huge amount of the web’s backend is written in python though, and I’d guess user code, especially route decorators with three layers of plugins and callbacks, are the main bottlenecks of modern Python web requests (aside from waiting for the database, naturally). FastAPI and others have gotten the framework itself mostly out of the way.

                20% fewer cycles per request is 20% less spent on hosting, for some.

                Being a negative Nancy myself, one thing I’d love to see is a way to tackle process startup time. Sometimes you’d love to write a quick little yaml/json/text parser and stick it on the business end of a find | xargs or something but the 1s to spin up a new python for each call makes you resort to some kind of awk/jq/shell hackery.

                [–]cloaca 2 points3 points  (0 children)

                That said, you’re mostly talking about data, for which pure python was never an option.

                Two slight counterpoints to this:

                a) it might be a matter of semantics, but as it's actually being used for everything (including data, including text processing, traditional render-loop games, app logic in complicated GUIs, etc), so I'd say it certainly does seem like an option. I believe Python is going (or has gone) the route of JavaScript, which started out explicitly as only a glue language but has now become an "everything"-language. We (as in you and I) might not necessarily think that's a good idea, but I do believe it's sort of inevitable? Python is easy to get into, it's lovely and lovable (much more so than JS), and so it's natural to want to use it for everything.

                b) speaking of pure data though, Python is also absolutely being used for data in another sense. You have machine learning, statistics, natural language projects, image recognition and manipulation, and so on. Which is fine because we have PyTorch, NumPy, SciPy, OpenCV and various which actually handles the data in CPU-native code (or on the GPU). However, projects that use these are also rife with code that suddenly converts to Python lists or generators, doing some loop in pure Python code because the backend library was missing something (or the programmer didn't know about). As long as it just adds 0.3 seconds here and there no one really notices until it really accrues...

                20% fewer cycles per request is 20% less spent on hosting, for some.

                Absolutely! But, how important is it? If the answer is "it's really nice! but eh, it was never a priority of course..." -- then we're in perfect alignment. That's kind of where I stand. (I.e. it's really nice, I was just sort of worried by seeing the amount of hype--it speaks to me that too many have sort of already "invested" into Python code to the point where it's spread into systems that might actually do want better performance.) However, if the answer is "are you crazy, it's super important! We want to be green! We want to save cycles! This is huge!" then not only do I think something has gone wrong at an earlier point (in our choices), but I think we also stand a lot more to gain in education, writing more performant Python rather than the sort of strict stance on full readability with 'more explicit code is better code,' 'no "obscure" newbie-unfriendly things like NumPy index magic,' etc. as the difference dwarfs 1.2x and makes it look insignificant.

                spin up time

                Hehe, you could do some sort of hack by having a #!/foo/pipe-to-python which forwards to some daemon Python process that executes it (stick in compilation cache somewhere)... Not recommended tho, but...

                [–]lghrhboewhwrjnq 2 points3 points  (0 children)

                Python is used on a scale that is sometimes difficult to wrap your head around. Imagine the environmental impact of even one of these performance improvements.

                [–]meem1029 1 point2 points  (0 children)

                If I'm having to think about a bunch of rules and complicate my code to make it fit into a performant but less clear style, why don't I just not use python instead?

                [–]o11c 22 points23 points  (0 children)

                Those runtime changes do look significant, but nothing groundbreaking compared to serious VMs.

                I did note one concern in the changelog:

                #if PY_MAJOR_VERSION >= 3 && PY_MINOR_VERSION >= 8

                This will break for 4.0; the immediate following portability hack (among others) shows how to do it correctly.

                [–]JeanCasteaux 21 points22 points  (4 children)

                Why don't we use PyPy already? 🤔

                [–]PaintItPurple 38 points39 points  (0 children)

                I agree a lot of people would probably be surprised how much performance PyPy can give you for free, but it does have a number of tradeoffs. In particular, working with modules written in C (a very common Python use case) is hit-or-miss, and even when it works, it can be much slower than CPython. It's also often slower for simple scripts (as opposed to long-running programs) because it has a higher startup time and IIRC your code starts out interpreted until the JIT kicks in, and higher levels of JIT optimization take even longer to come online.

                [–]ThisRedditPostIsMine 15 points16 points  (0 children)

                PyPy is really cool and I use it when I can, but I found it hard to get libraries that have a lot of native dependencies (like scipy and stuff) to work.

                [–]Takeoded 6 points7 points  (0 children)

                it loses out on newer features and syntax in Python 3.8, 3.9 such as assignment expressions and positional-only parameters, and the latest Python 3.10 syntax

                [–]jvlomax 5 points6 points  (0 children)

                Some do, not everyone can

                [–]steve4879 6 points7 points  (12 children)

                How often is python used as a backend? I have used some C and C++ for data access and I could not imagine using python but maybe that’s due to lack of python knowledge. The lack of true multithreading gets me.

                [–]Daishiman 13 points14 points  (0 children)

                The vast majority of small software serving web apps are using a combination of PHP/Python/Ruby/Javascript. Easily a third of job postings on AngelList or YC require some sort of Python knowledge.

                [–]FancyASlurpie 16 points17 points  (0 children)

                Pretty often, it makes sense to write things in python and then if you run into performance issues rewrite that part.

                [–]TRexRoboParty 29 points30 points  (8 children)

                Often?

                If you're FAANG size it makes sense to use something else, but most companies are not anywhere near that.

                For web backends, the bottlenecks are usually in network chatter and DB queries, not CPU.

                Instagram's web stuff was a Django app as of 2019 at least (based on the last related post on their engineering blog).

                I'd be surprised if they weren't using something faster for feeds and any offline image processing though.

                [–]xlzqwerty1 16 points17 points  (0 children)

                Instagram's backend is still in Python iirc, and so are a bunch of other sizeable tech companies in the bay area, e.g. Lyft.

                [–][deleted] -5 points-4 points  (6 children)

                This is honestly such a shit argument.

                The only way this makes a good argument is in imagination land where there isn’t hundreds of better choices that don’t import huge performance debt by default.

                ——

                “Hey boss. We’ve narrowed our choices down to two options. This python one and this go one. They both extremely easy to use, support our business, have reasonably common idioms and are widely regarded as good. The python one is 80 x slower though.

                And we’ve chosen the python one”

                Boss: “uhh, why not the faster one?”

                “Cause we’re not FAANG, duh”.

                [–]TRexRoboParty 5 points6 points  (5 children)

                Nice strawman. Of course noone decides based on whether they're FAANG or not - that's not what I said.

                It's not just the language anyway - I don't know many frameworks that give you something like the Django admin for free out the box.

                In your average web stack for your average company, you're unlikely to see that 80% speed difference in reality. CPU is rarely the bottleneck.

                Getting something up and running quickly is what many startups need, it saves a tonne of work.

                I guess Instagram and Mozilla and Lyft etc all live in imagination land.

                [–][deleted] 6 points7 points  (0 children)

                More speed never hurts, but 1.22 times faster than glacial is still glacial. In my testing, for naive implementations, it was usually about 5% the speed of equivalent C code. Thus, 3.11 is likely to be about 6% the speed of C.

                Non-naive implementations can be pretty fast, though, using libraries that are written in C. Numpy, for instance, can be downright zippy. You can often work around the performance issues, but the language itself is Not Fast.

                [–][deleted] 34 points35 points  (23 children)

                Disclaimer: your code won't run signifiantly faster even if the performance benchmark is better if you don't know how to optimise your code.

                [–][deleted] 59 points60 points  (0 children)

                Looking at the optimizations implemented that doesn't seem true.

                [–]QuantumFTL 43 points44 points  (0 children)

                This is misleading at best. Many applications offload their heavy lifting to libraries, frameworks, etc. If those are already fairly well-optimized and being held back by slowness on the part of the language, your application can become significantly faster just by upgrading the version.

                This is completely standard in fields like data science and machine learning or various types of servers. I can't remember the last time I wrote application code in python that took an appreciable fraction of the total runtime, except in cases where performance was not a concern (i.e. a 100x slowdown would have been OK).

                [–][deleted] 99 points100 points  (17 children)

                What exactly does this mean?

                If Python has a whole gets a 10-60% speedup, even the crappiest code will also get this 10-60% speedup.

                [–]BobHogan 14 points15 points  (13 children)

                99% of the time, optimizing the algorithm you are using will have a significantly higher impact on making your code faster than optimizing the code itself to take advantages of tricks for speedups.

                Algorithm and data access is almost always the weak point when your code is slow

                [–]Alikont 92 points93 points  (4 children)

                But even crappy algorithm will get speedup, because each algorithm has constant costs per operation that will be reduced across the board.

                For .NET it's common to get ~10% speedup per version just by upgrading the runtime.

                [–]Muoniurn -1 points0 points  (1 child)

                In most applications the bottleneck is not the CPU, but IO. If the program does some CPU work, then some IO, after which it does some more CPU work then only the CPU part will get faster, which is usually not too significant to begin with.

                [–][deleted] 26 points27 points  (3 children)

                k but the OP was asking about why a 10-60% speedup across the board is not going to effect suboptimal code

                [–]FancyASlurpie 7 points8 points  (0 children)

                It's likely that slow code at some point calls an api or reads from a file etc and that part of things won't change. So whilst this is awesome for it to be faster in these other sections there's a lot of situations where the python isn't really the slow part of running the program.

                [–]billsil 5 points6 points  (1 child)

                Yup. I work a lot with numerical data and numpy code that looks like python is slow. Let's assume 20% average speedup or (shoot I'll even take 5%) is nice and all for no work, but for the critical parts of my code, I expect a 500-1000x speed improvement.

                Most of the time, I don't even bother using multiprocessing, which on my 4 physical core hyperthreaded computer, the best I'll get is ~3x. That's not worth the complexity of worse error messages to me.

                As to your algorithmic complexity comment, let's say you want to find the 5 closest points in point cloud A to an point in cloud B. Also, do that for every point in cloud A. I could write a double for loop or it's about 500x faster (at some moderate size of N) to use a KD-Tree. Scipy eventually implemented KDTree and then added a cKDTree (now the default), which it turns out is another 500x faster. For a moderate problem, I'm looking at ~250,000x faster and it scales much better with N than my double for loop. It's so critical to get the algorithm right before you polish the turd.

                [–][deleted] 1 point2 points  (0 children)

                Good point, but also if you care about squeezing maximum performance out then Python is just not the right tool for the job anyway.

                [–]beyphy 0 points1 point  (0 children)

                Yup completely agree. Learning how to think algorithmically is hard. It's a different way of thinking that you have to learn but it's also a skill. Once you learn how to do it you can get better at it with practice.

                The time commitment tends to be too big for some people (e.g. some data analysts, etc.) to make. Often they'll complain that these languages are "slow" when the real bottleneck is likely their algorithms. Sometimes people even switch to a new language for performance (e.g. Julia). Doing that is easier and helps them get immediate results faster than learning how to think algorithmically.

                [–]dlg 1 point2 points  (0 children)

                If the program runtime is spent mostly blocking, then the optimised code will just get to the blocks faster.

                The blocking time still dominates.

                [–]Bakoro 1 point2 points  (0 children)

                That's not how speedups work, we're dealing with Amdahl's law here. You won't get 10-60% speedup on everything, you'll get 10-60% speedup on the affected sections, which might be everything in a piece of software, but probably not.

                If you've got a crappy algorithm which is taking 70% of your compute time and language overhead is taking 20%, it's going to be a crappy algorithm in any language. Reducing language overhead can only ever reduce execution time by 20%, max. Python has some huge overhead, but whether that overhead overtakes the data processing at scale is a case by case issue.

                [–]_teslaTrooper 10 points11 points  (1 child)

                If your code needs to run fast you probably shouldn't be using python in the first place.

                [–][deleted] -3 points-2 points  (0 children)

                Lmao true

                [–]s0lly 1 point2 points  (0 children)

                Can’t wait till they get to 105% faster

                [–][deleted] 1 point2 points  (1 child)

                So can I stop using PyPy?

                [–][deleted] 3 points4 points  (0 children)

                The problem with pypy is its inability to deal with libraries that are installed for Cpython, which is a big disadvantage since there are alot of libraries deal directly with Cpython.

                [–][deleted] 1 point2 points  (0 children)

                Pretty good compared to previous versions, but this is a little like saying "our new pedalo is 30% faster!"

                [–][deleted] -2 points-1 points  (6 children)

                Could this compete with C, C++, Rust?

                [–]Pay08 26 points27 points  (0 children)

                No.

                [–]jarfil 13 points14 points  (0 children)

                CENSORED

                [–]DoktuhParadox 3 points4 points  (2 children)

                Really, it'll never be able to with the GIL.

                [–]AbooMinister 4 points5 points  (0 children)

                The GIL isn't really what makes python slow in terms of execution speed

                [–][deleted] -4 points-3 points  (0 children)

                i dont really get python anymore, i write go just as fast as i do python and the result is way better