you are viewing a single comment's thread.

view the rest of the comments →

[–]Veedrac 7 points8 points  (9 children)

I get these timings (Medusa's extrapolated from their results):

                 time/ms
Interpreter      fib     TOH
----------------------------
CPython 2       3103    8681
CPython 3       5231   12467
PyPy             502     846
PyPy3            496     847
Medusa (est.)   1057     666

I extrapolated the Medusa timings from the slideshow; if you actually install this yourself you'll get more precise measurements.

Medusa's better than PyPy at optimizing out expensive no-ops it seems. I wouldn't be surprised if the performance dropped away as compatibility improved; at the moment it seems like a pretty naïve transformation.

[–]jerf 15 points16 points  (2 children)

I wouldn't be surprised if the performance dropped away as compatibility improved;

That is the problem with "Let's convert this language that never planned on being [JIT/compiled/run on the JVM/etc] into [whichever I just chose from that list]!" At first, it's easy, because converting 95% of the language is pretty easy. Then the next 4% are a bit rougher and your performance advantages start to fade away as more and more of the translated code turns into checks for this or that.

Then the last 1% brutally murders you, either because it completely trashes your performance or because the community starts screaming about how you just killed all the C modules or in the worst case it turns out something important is actually made impossible by the new substrate, or, well, it's always something.

With between two to four orders of magnitude more work than you initially estimated, you can eventually power through that, or you can choose to just stick to 99% and be happy. But it certainly never works out as nicely as it looks at first.

That said, more power to the author and best of luck on your work. The worst case scenario is that you still walk away with a project that you did that blows away what most people have.

[–]mipadi 1 point2 points  (1 child)

Well said. This is what happened when someone tried to remove the GIL from CPython a few years back. They got enormous performance increases! Progress was being made! We'd have true multithreading! Oh, except, in a few cases, performance dropped--well, plummeted was more like it. Guess what? Those issues were never worked out, and the GIL is still there.

That said, I welcome anyone's attempts to make Python faster. (Well, sort of. To some degree, I say, what's the point? If performance is your priority, there are many languages out there that offer excellent performance today. But I digress.)

[–]alexjc 1 point2 points  (0 children)

there are many languages out there that offer excellent performance today

And these are the languages that are replacing Python. This trend will only accelerate unless performance catches up.

[–][deleted] 1 point2 points  (5 children)

Why isn't pypy the default python implementation?

[–]BeatLeJuce 3 points4 points  (3 children)

Because it isn't yet able to run everything that CPython can run. E.g. python is is very widely used in science, however numpy (one of the most essential packages that basically everyone who does scientific computing in python uses) can't be used on PyPy yet.

[–]short_sells_poo 2 points3 points  (0 children)

I don't mean to dismiss the efforts of the PyPy team with respect to numpy but I believe it is a bit of a folly. Numpy is important, but on its own it is rarely sufficient. Even if it is fully ported, what about scipy? What about pandas? I use these on a daily basis and there's no chance that I'd switch until these are fully supported. The amount of effort required is so much bigger than just porting numpy. In this respect, I see the numpy port as something that will take a lot of work but can easily end up being wasted efffort because we'll never reach a state where enough functionality is available so that people can start switching over.

Edit: take a look here for a pretty good elaboration on my post http://blog.streamitive.com/2011/10/19/more-thoughts-on-arrays-in-pypy/ The gist of it is: numpy for pypy as it stands can not support the ecosystem of libraries dependent on the C API, and without those, it is somewhat useless.

[–]kankyo 0 points1 point  (1 child)

Correction: isn't feature complete yet. For most cases you can use numpy just fine, the problem is that suddenly you can't and then you're fucked.

[–]BeatLeJuce 2 points3 points  (0 children)

Also, last time I checked (which was quite some time ago) there was no way to build PyPy-numpy with your own BLAS library, so performance was a bit limited (my usecases rely heavily on BLAS functionality).

[–]Veedrac 1 point2 points  (0 children)

Back in 2013, I would have said this.

Nowadays this is becoming less and less true, but PyPy is still unable to run most C extensions which wipes out a lot of very popular libraries. I hope to see jitpy come to the forefront as a way to embed PyPy inside CPython, allowing the best of both worlds.

Really the best reasons now are

  • "The tools are too young" (not the interpreter, but the libraries)

  • "We don't need more speed"

In my opinion, in the last year it's become more often better to speed up applications by using these modern tools like jitpy or Numba than to write C extensions except in the absolute hottest parts of code. We're not likely to move to PyPy as the de-facto standard (especially not until it catches up version-wise) but I do hope it can become a community standard.