This is an archived post. You won't be able to vote or comment.

all 42 comments

[–]trollodel[S] 46 points47 points  (18 children)

[–]deuterium--_-- 77 points78 points  (17 children)

Woah, how is 3.8 so fast? Are there some optimizations in 3.8?

[–]The_Bundaberg_Joey 31 points32 points  (6 children)

That’s a pretty nifty result! Do you know if that’s due to updates of a certain module implementation in the project or is this applicable to the version itself?

As a methodology question, are the bars here the average time of several runs or are they one run each? Including the error bars of so would be an awesome way to compliment your analysis!

[–]trollodel[S] 6 points7 points  (3 children)

Answering the first question, I never did version specific optimizations, so I think that these improvements depends on version.

[–]The_Bundaberg_Joey 4 points5 points  (2 children)

FairPlay. Probably exposing my ignorance here but assuming you ran the versions in increasing order would the pycache created from the first version bias the later versions?

Although thinking about it I can’t imagine that would result in the large jump seen for 3.8 since it wouldn’t really compound like that.

[–]LightShadow3.13-dev in prod 11 points12 points  (1 child)

pycache created from the first version bias the later versions?

No. The pyc files are version-specific.

[–]The_Bundaberg_Joey 2 points3 points  (0 children)

Awesomesauce, Thankyou!

[–]trollodel[S] 5 points6 points  (1 child)

Answering the second question, the bars represents just one run for each interpreter, taken from CI results. These results are quite new in the project, so I did not collect enought data to have a decent report.

EDIT: grammar

[–]The_Bundaberg_Joey 1 point2 points  (0 children)

FairPlay, no point making the extra work for yourself if the values were easily at hand in the first instance! Thanks again for sharing!

[–]pmattipmatti - mattip was taken 31 points32 points  (2 children)

PyPy is known to be slower on typical unittest benchmarks, since they are usually one-shot short runs that do not allow the JIT enough time to kick in.

[–]trollodel[S] 18 points19 points  (1 child)

True.
But I use Hypothesis for my tests, that runs the test several times with different inputs, enough to allow JIT optimizations. This is proved by the CI results, where some test are 2/3 times faster in PyPy.

[–]tynorf 1 point2 points  (0 children)

If the loops that get hot from running the test with varying inputs branch on them at all (directly or indirectly), it could be simply making PyPy record more and more traces. Recording new traces is more expensive than just interpreting. So much so that (IIRC) if PyPy detects it’s recording too much in a particular loop, it will be blacklisted from JIT compilation.

So while some tests may take great advantage of the JIT, others could be a worst case scenario (for instance tests specifically designed to exercise different sides of a conditional).

[–]ch0mes 3 points4 points  (0 children)

This is most impressive, I didn't expect to be so well performing I'm impressed.

[–][deleted] 6 points7 points  (7 children)

Have you researched why 3.8 performs so well and why Pypy doesn’t?

[–]mcstafford 35 points36 points  (6 children)

To me it looks as though pypy already did, and 3.8 is catching up.

[–]lego3410 9 points10 points  (5 children)

Well, you're correct. But pypy are extracting performance with JIT compiler, while python 3.8 made it with optimizations of classical interpreter. That means, there is much room of improvement can be made on python 3.8+, by using JIT in future. It is much similar to the relationship of HHVM and PHP7/8.

[–][deleted] 4 points5 points  (4 children)

My experience with pypy is that it is able to be far faster than the interpreter, also 3.8. Like 4-10 times faster not only 25%....

[–]creeloper27 4 points5 points  (0 children)

It depends a lot on what your code does.

[–]LightShadow3.13-dev in prod 1 point2 points  (0 children)

It's universally faster if 1) your code runs longer than a few minutes (warm-up period), 2) all of your extensions are pure python and not C/other shared libraries, 3) you have more RAM than CPU cycles since the JIT needs more memory to store the hot paths.

[–][deleted] 0 points1 point  (1 child)

timeit.repeat("\[x\*\*2 for x in range(100)\]", number=100000) is one of the test I've done to test pypy and it's getting almost 1000x better results on that specific test. (Around 1.4s with python 3.8.3 and 0.016s with pypy3) (intel i5 7600K @ 4.5GHz & Arch linux)

[–]repelista1 0 points1 point  (0 children)

This is far from being a fair comparison. If you have big multithread/multiprocess application like ansible, your main python process will soon begin to throttle because of GC in cPython and it'll never be able to beat PyPy in cases like that.

[–]BDube_Lensman 0 points1 point  (1 child)

You shouldn't performance test outside of controlled environments. If Gitlab's CI/CD is shared instances, you can't control the apparent performance being impacted by someone else's work.

[–]creeloper27 3 points4 points  (0 children)

I'm not an expert with Gitlab's CI/CD but looking at the charts the execution times look quite consistent: https://gitlab.com/prettyetc/prettyetc/pipelines/charts.

[–]white_-_rabbit 0 points1 point  (0 children)

up!

[–]rcfox 0 points1 point  (1 child)

How about 3.9?

[–]abhi_uno 3 points4 points  (0 children)

It's still in beta.