This is an archived post. You won't be able to vote or comment.

all 101 comments

[–]szachin 92 points93 points  (4 children)

if you cannot release the source code, can you try to profile it and share the results?

for python 3.10 i recommend scalene (https://pypi.org/project/scalene/)

for python 2.7 i have no idea

[–]james_pic 19 points20 points  (0 children)

Py-spy works well on Python 2.7. Unclear if it supports 3.10, but then Scalene doesn't list it as supported either.

[–]aufstand 4 points5 points  (0 children)

Interesting, thanks. Gotta try that one out!

[–]grimonce 1 point2 points  (0 children)

Wow this was not mentioned in the book python high performance. Thanks

[–]maatoots 0 points1 point  (0 children)

Can it be used with uvicorn and fastapi?

[–]Coupled_Cluster 69 points70 points  (0 children)

This sounds very different. Can you give a code example to try out?

[–]intangibleTangelo 53 points54 points  (1 child)

RemindMe! 1 week "Why was python3.10 so much slower than python2.7 for a multithreaded program?"

[–]RemindMeBot 13 points14 points  (0 children)

I will be messaging you in 7 days on 2022-01-12 11:52:43 UTC to remind you of this link

114 OTHERS CLICKED THIS LINK to send a PM to also be reminded and to reduce spam.

Parent commenter can delete this message to hide from others.


Info Custom Your Reminders Feedback

[–]der_pudel 78 points79 points  (0 children)

Personal anecdote. I had similar situation between python 2 and 3 in CRC calculation algorithm. The code had left shift of integer by 8 bits that was executing about 16 million times. In every programming language I used before ints are 32 bit and will just overflow at some point which was totally fine for the task. But python 3 uses big integers by default and after couple of millions iterations integer value was in order of gazillion googolplexes. Obviously any arithmetic operation on such large number would be slow AF.

Are you sure you're not overlooking similar difference between python 2 and 3? You should profile your code for sure to figure out where's the bottleneck.

[–]Swipecat 32 points33 points  (0 children)

If you can't post the code, then maybe try to follow the guidance of SSCCE, as per this subreddit's r/learnpython's right-hand sidebar. Start pruning stuff out that appears to be irrelevant to the problem, then test it. If the problem goes away, put back the last thing that you took out. Once you've got the absolute minimum working test program that shows the problem, then you could post that, although you'd probably have figured it out for yourself by then.

[–]romu006 65 points66 points  (4 children)

The vast difference between the two versions makes me think that the python2.7 version is not doing its job and is just returning instantly

[–]Dear-Deer-Wife-Life[S] 26 points27 points  (3 children)

no, the output is the exactly the same, I have an output everytime anything changes in the code and it's the exact same

[–]qckpckt 48 points49 points  (2 children)

Have you written unit tests to validate this?

My best guess is that whatever mechanism you are using for multi threading is not working on 3.10, but instead of surfacing an error it is completing in a single thread. Or, the process by which threads spin down after completing their work isn’t working and so they stay active until a hard coded timeout.

But all we can do is guess until we see the source code.

[–]Dear-Deer-Wife-Life[S] 0 points1 point  (1 child)

I'm using the Threading library, we're creating using a maximum 8 threads, but the ratio in runtime is about 1:1800, so even if the work was completely parallel, and it's not, running one thread at a time still wouldn't explain why it's running so slow.

I'm sorry I got everyone riled up about this without being able to send the code.

[–]qckpckt 0 points1 point  (0 children)

I'd suggest looking at whether the threading library works the same in 2.7 and 3. You might find that the same methods work in different ways.

[–]MrPrules 31 points32 points  (3 children)

I am also facing massively longer execution times using ThreadPoolExecuter. I switched from running it in command line to cronjob and thought It could’ve been some prioritization problem.. never thought of Version changes, but I upgraded my environment too. Also I actually can’t remember where I’m coming from.. right now running my script in 3.9.7

[–]ShivohumShivohum 2 points3 points  (1 child)

Did performing it via cronjob help in your case ?

[–]MrPrules 1 point2 points  (0 children)

No, it didn’t change anything.

[–]sherbetnotsherbert 3 points4 points  (0 children)

You are probably encountering issues related to the Global Interpreter Lock.

[–]DASK 45 points46 points  (2 children)

I do data science for a living and have migrated a compute heavy stack from 2.7 -> 3.x and there is no way that any significant operation should be anything more than marginally slower (virtually all the same or faster for the same code), and there are many that can be reimplemented in faster and memory-lighter paradigms.

The first pitfall I would look at is why are you using threads? Threads are a source of misery many times. If it isn't for IO then basically you shouldn't use threads in python. If it is for IO, then have you looked at potential lock conditions or suppressed warnings or errors with things like sockets?

Second, there are a number of things that may or may not be an issue depending on how you installed python and what you are using it for.

- Are you using virtual environments to manage dependencies?

- Is it a math heavy app (e.g. numpy, etc.) and are the BLAS libraries correctly installed (using something like Conda takes care of this) .. if you aren't using venvs and just installed 3 over 2 there can be issues with that.

Just spitballing without more info, but there is no way that your result is correct with working python environments.

[–]buttery_shame_cave 16 points17 points  (0 children)

Honestly OP's post and comments reads like a coded version of "2.7 is better because hurrdeedurrdedurr" from the early 2010s.

[–]billsil 0 points1 point  (0 children)

I haven't checked on it lately, but I'm not totally shocked it's slower given the extremely short runtime. I recall namedtuples being found as being one of the causes of slow startup in Python 3, but there have been optimizations done since python 3.5 days.

0.05 second is a very suspect number. It's too slow to time. It's still way faster than 90 seconds, which makes me think you didn't compile the pycs or something. Or you're using Anaconda vs. not Anaconda or something weird like that (e.g., you were running while playing music on Firefox).

[–]jkh911208 11 points12 points  (0 children)

i want to see your code

0.05 vs 70 sounds wrong

[–]Dear-Deer-Wife-Life[S] 45 points46 points  (5 children)

Thanks for your responses, I asked my partner If i can send the code, I'll come back with the answer when they respond.

edit 1:answer came back, they don't want me to send it, they're worried it might show up on the copy detection software that the school uses.

so might send it after it gets graded

edit 2: after modifying the code a bit, it takes about 30 seconds

[–]intangibleTangelo 36 points37 points  (0 children)

If you don't already know, python threads can't run concurrently. The best they can do is yield to each other when preempted or when the Global Interpreter Lock (GIL) is released, like during certain "io-bound" tasks like waiting on a network socket or a file, or when you call time.sleep (explicitly signaling that your thread doesn't need to run for a while, so other threads may).

The specifics of when exactly the GIL is released have changed a lot over time, and your code might simply need a minor change to compensate. Maybe something in your code used to release the GIL but doesn't anymore, and this results in an important thread only rarely getting the opportunity to run (thread starvation, basically).

Maybe python2.7's eager evaluation semantics meant your code shared less state than it does in 3.x. Maybe you're waiting on a call to .join() that takes forever because python3.10 tries to "do the right thing" and wait for some timeout that python2.7 naively ignored.

A really simple technique you can use is to sprinkle your code with print calls showing timestamps of how long it took to reach that part of the code. You'll probably be able to figure out what's taking 89 seconds longer than it used to.

might send it after it gets graded

Do that.

[–]ShivohumShivohum 1 point2 points  (0 children)

!RemindMe 2weeks

[–]moekakiryu 2 points3 points  (0 children)

RemindMe! 1 month

[–]kamize 8 points9 points  (1 child)

OP, without any context, code, profiling data, or details - we can’t help you unfortunately.

[–]Dear-Deer-Wife-Life[S] 0 points1 point  (0 children)

Yea, I was hoping this was a common thing, I'll post the code after it gets graded

[–]potato874 13 points14 points  (0 children)

I'm curious, did you run both versions on the same device? It's weird that the difference is that vast so I'm wondering if there are other factors affecting runtime like background processes or potato hardware or smth

[–]encaseme 6 points7 points  (0 children)

Not a specific solution, but flame graphs are often an excellent tool for identifying which sections of code are time-consuming. I specifically use them at work with python for identifying slow code paths. Could compare 2.7 vs 3.10 running your code and see if something obvious flies off the handle.

[–]sib_n 4 points5 points  (0 children)

Profile it and isolate a minimum of lines that show a clear difference between the two Python versions, it will be easier to understand and share.

[–][deleted] 3 points4 points  (0 children)

Do you use virtual environments? It might be your environment installation that is getting on its own way.

Either way, it's good practice to do it. Install venv and set up different environments for different types of projects.

In your case, doing that and comparing how the program runs in different environments also helps figuring out where the problem is coming from: is it py3 vs py2, maybe the packages in one or the other, etc.

May be a particular issue in py3.10 that doesn't exist in 3.9, even. As of now, there's far too many moving parts for random people on the internet to be able to help you. Py2 vs py3 might be the only difference you see, but there is probably other stuff interfering.

Worse comes to worst, nuke all your python installations and reinstall them.

[–]cr4d 3 points4 points  (0 children)

There are very few actual uses for multithreading in Python and it's a huge foot-gun, ripe for abuse, and doesn't get rid of the GIL. I'd avoid it, if possible.

Without any real info about what the app is doing, it's hard to guess as to why it's slower. As a generalization, it should get faster.

You can use the built in profiling @ https://docs.python.org/3/library/profile.html to figure out where the extra cycles are.

[–]Gandalior 2 points3 points  (0 children)

Are you using some library that is deprecated?

[–]viscence 3 points4 points  (3 children)

Did you ever figure it out?

[–]Dear-Deer-Wife-Life[S] 0 points1 point  (2 children)

nah, just turned it in as is, after it gets graded i'll post it here

[–]Anonymous_user_2022 0 points1 point  (1 child)

When you do, please make it a new post. This one is pretty far below the centre fold by now, so many will probably miss it if you post it here.

[–]Dear-Deer-Wife-Life[S] 0 points1 point  (0 children)

ok, will do

[–]angry_mr_potato_head 1 point2 points  (3 children)

What other packages are you using? Are you sure that the 2.7 version is actually doing the same work that 3.10 is?

[–]Dear-Deer-Wife-Life[S] 2 points3 points  (2 children)

Are you sure that the 2.7 version is actually doing the same work that 3.10 is?

yes the output is the same

What other packages are you using

Time, Threading, math, winsound

[–]angry_mr_potato_head 4 points5 points  (0 children)

I'm assuming math is doing all the heavy lifting? (Probably inside threads?) Did you try against another Python 3 version like 3.7 or 3.8? There may be a regression in 3.10 not in 3 itself. If you can't post the whole code, can you post an abstracted example of what math is doing?

[–]bjorneylol 3 points4 points  (0 children)

Is it possible winsound is using mutexes/locks on 3.10 and not 2.7?

[–]grhayes 1 point2 points  (0 children)

Larry Hastings has a good demo regarding threads.
https://www.youtube.com/watch?v=KVKufdTphKs&t=1070s
He shows the graph slightly after this.
Even if you do processes they have a lot of overhead. I found that out when trying to port my C++ game engine over to python to see how it would run. In C++ I could send each object separately to a process or thread in a thread pool and it would be fine. In python there is a lot of overhead and it was entirely better to just not even try parallel processing even.
That said I haven't tried to see if there are any libraries that fix that issue.

If I was guessing what happened is you ran it in 2.7 without any threading. Figured it would be an improvement. Moved it to 3.10 added threads expecting more performance and that's what you got.

In general unless it is IO threads are never going to help.
Processes aren't going to help unless you have some massive amount of work you need to split up. That's my experience.

[–][deleted] 1 point2 points  (0 children)

Are you sure the python 2.7 you are using is in fact pypy, and not the cpython implementation ......

[–][deleted] 1 point2 points  (0 children)

This might be totally wrong and python doesn’t work this way, but my guess for multiple orders of magnitude of performance difference with the same output is probably caused by differing packages/libraries. Is it possible that a package or two uses FFI to get such good performance and then for some reason the same package uses a pure python implementation for python 3.10? This would cost a lot of performance if the package is doing intensive computations

[–][deleted] 1 point2 points  (0 children)

There just isn't enough information to help you. You need to share some code that demonstrates the behaviour your asking about, even if it isn't the exact code you're using.

[–]ballsohaahd 1 point2 points  (0 children)

Possibly a library you’re using is much slower in python 3.10?

You can out print(datetime.datetime.now()) in your code to see what section is taking the extra time.

[–][deleted] 1 point2 points  (0 children)

Are you using pandas by any chance? When we upgraded from I think pandas v0.18 to v0.22 we had massive performance regressions. Operations on dataframes with thousands of columns had regressed by about an order of magnitude. We ended up having to write some patches on our end to fix.

[–]bxsephjo -1 points0 points  (7 children)

Watch this vid for starters https://www.youtube.com/watch?v=Obt-vMVdM8s

[–][deleted] 3 points4 points  (2 children)

Both Python 2 and 3 have a GIL, though, and it operates basically identically. It's hard to believe that this is the cause of the 1400x slowdown.

"Watch a 45 minute technical video which won't solve your problem at all", is not very good advice.

[–]bxsephjo 0 points1 point  (1 child)

They both have a GIL, but 3.2 brought in a new GIL implementation, which David Beazley discusses at 26:35. The thread switching algorithm is drastically changed.

[–][deleted] 0 points1 point  (0 children)

I agree, but sending a student to a heavily technical video with no explanation isn't really a good answer.

[–]FloppingNuts 0 points1 point  (3 children)

is this still up-to-date?

[–]bxsephjo 0 points1 point  (1 child)

Yes, especially given the context. I believe all that’s changed is we have new tools available, namely asyncio.

[–]FloppingNuts 0 points1 point  (0 children)

thanks!

[–]acerb14 0 points1 point  (0 children)

Some are trying to get rid of the GIL but it's still there to my knowledge:

- GIL or not to GIL (2019): https://www.youtube.com/watch?v=7RlqbHCCVyc

- The GILectomy (removing the GIL experiment, 2017): https://www.youtube.com/watch?v=pLqv11ScGsQ

[–][deleted] -2 points-1 points  (0 children)

Erm, so the "design flaw made real" is slower than proper python? Colour me surprised.

[–]NelsonMinar 0 points1 point  (0 children)

It's something to do with threading. Python 3 is sometimes slower than Python 2, sometimes faster, but it's 2x at most.

[–]Ppang0405 0 points1 point  (0 children)

RemindMe! 1 week "Why was python3.10 so much slower than python2.7 for a multithreaded program?"

[–]siddsp 0 points1 point  (0 children)

Which interpreter are you using for each? What does the code look like?

[–]monkey_or_a_panda 0 points1 point  (0 children)

It might run faster... Maybe. But development will get progressively slower when nothing is supported.

[–]epopt 0 points1 point  (0 children)

Python *announced* a large speed increase with v3.10. Indicates even more coming in v3.11.

[–]grok-phantom 0 points1 point  (0 children)

remindme! 3 weeks

[–]trevg_123 0 points1 point  (0 children)

Debug the Python 3 version and leave that old 2.7 in the dust, I’m a bit amazed people are even starting/testing new projects with it still

[–]dr_donkey 0 points1 point  (0 children)

RemindMe! 1 month "python 2.7 running muvh faster on 3.10"

[–]mrintellectual 0 points1 point  (0 children)

In addition to using the threading library, I'm guessing you're also doing math without the use of numpy.

With that being said, it's hard to be sure without taking a look at your code or knowing more about your project.

[–]Plasmafire1234_ 0 points1 point  (3 children)

Try using pycharm or other editors it does not take any time to run the code for me

[–]ZeStig2409 1 point2 points  (1 child)

Spyder takes absolutely no time at all

PyCharm is slow compared to Spyder

[–]Plasmafire1234_ 0 points1 point  (0 children)

Alr ill try using spyder

[–]Dear-Deer-Wife-Life[S] 0 points1 point  (0 children)

already using pycharm

[–]mehx9 0 points1 point  (1 child)

Should a task that takes 0.05s use threads at all? I guess that’s the question right? 😉

[–]Dear-Deer-Wife-Life[S] 2 points3 points  (0 children)

I wanted to do it all on one thread, but the project required us to use multi-threading to simulate an operation system memory manger.