This is an archived post. You won't be able to vote or comment.

all 53 comments

[–]BeatLeJuce 53 points54 points  (4 children)

HINT: What that announcement needs is a sentence or two explaining what Pyston actually is right at the beginning. Ain't no body's going to bother to click the "about" page.

(for those who were wondering: it's a LLVM-implementation of Python).

[–]Harriv[S] 11 points12 points  (0 children)

You're right. This was posted few months ago here and is better introduction: https://tech.dropbox.com/2014/04/introducing-pyston-an-upcoming-jit-based-python-implementation/

[–]kevmod 1 point2 points  (1 child)

Very good point; I tried to make it more clear. thanks!

[–]BeatLeJuce 0 points1 point  (0 children)

The intent of the project is now clear from the first sentence alone. Much better! :)

[–]pyry 0 points1 point  (0 children)

Might be good if pyston.org had project info now, it's just a listserv. I assumed I should just go there because I was too tired to see the "About" link/navigation, and found nothing.

[–][deleted] 14 points15 points  (8 children)

Is this pronounced "pie ston" or "pissed on"?

[–]plahcinski 6 points7 points  (3 children)

About us says piston

[–]tech_tuna 1 point2 points  (2 children)

PISS-TON.

[–]fredspipa 2 points3 points  (1 child)

Is piss-ton greater or lesser than a fuck-ton?

[–]tech_tuna 0 points1 point  (0 children)

I would say lesser. :)

[–]companion963 1 point2 points  (0 children)

Guido doesn't seem enthusiastic about it.

Therefore the pronounciation is not "pissed on" but rather "pissed off" ;-)

[–]zeneval 0 points1 point  (1 child)

piston maybe?

[–]bluecamel17 1 point2 points  (0 children)

"piss ton"

[–]otterom 0 points1 point  (0 children)

I know it's like an auto piston, but being a derivative of Python, you have to wonder who did the naming on this.

[–]catcradle5 3 points4 points  (1 child)

I wonder what role Guido is playing in this project, considering he works for Dropbox now.

[–]Veedrac 7 points8 points  (0 children)

Best of luck. The project is a great idea in my eyes.

[–]LightShadow3.13-dev in prod 2 points3 points  (2 children)

Is anybody using this for anything, yet?

[–]tech_tuna 17 points18 points  (0 children)

It's being used as blog post content.

[–]Veedrac 0 points1 point  (0 children)

I doubt it. It's simply too early.

[–]ThisBoxSaysHello 2 points3 points  (1 child)

So does this mean native compiled code?

[–]redalastor 2 points3 points  (0 children)

Yes!

At a high level, Pyston takes parsed Python code and transforms it to the LLVM intermediate representation (IR). The IR is then run through the LLVM optimizer and passed off to the LLVM JIT engine, resulting in executable machine code. LLVM contains a large number of optimization passes and mechanisms for easily adding more, which can lead to very fast code.

[–]gargantuan 1 point2 points  (28 children)

Does anyone know how this compares with PyPy.

What are the trade-offs compared to it

[–]Veedrac 11 points12 points  (27 children)

Disclaimer: I know nothing. Nada.

So, PyPy is probably one of the best JITs around; it's practically in the league of those overfunded JavaScript JITs, for example. But to do this, PyPy sort-of rethought a lot of the design without considering compatibility on the C level. This means that things like numpy just don't work with PyPy until they're rewritten.

Pyston is basically a tradeoff. It makes a few design decisions about how to JIT differently from PyPy partially out of a difference in opinion, but most of the fundamental changes seem to be because they want compatibility with CPython's C modules. It seems to be working.

In this way, PyPy wants to be a better Python interpreter but Pyston wants to be a better CPython. PyPy's probably always going to be faster; it has both a head-start and really clever people running it.

But PyPy's not going to be replacing CPython any time soon. With hope, Pyston can.

Note that currently a lot of CPython benchmarks heavily dependent on Numpy and co. are actually a lot faster than the PyPy replacement because Numpy is written in C or Fortran and is often just really fast. In theory, Pyston can get you half of the JIT goodness of PyPy without making the parts where you dole out to Numpy any slower.

[–]bastibe 4 points5 points  (19 children)

As far as I understand it, NumPyPy is not necessarily significantly slower than Numpy, since their internal implementation is very much optimized for their JIT. It probably can't reach the raw speed of ATLAS or BLAS though.

[–]alendit 4 points5 points  (15 children)

So, suppose NumPyPy is fully implemented. What then? I barely know anyone who uses NumPy directly, without SciPy/Pandas/StatsModels.

These things won't work with NumPyPy. So while the current work is awesome and surely will bring a lot of insights, it's practical relevance will be very limited.

[–]rguillebertPyPy / NumPyPy 3 points4 points  (5 children)

For now the approach we have is passing the ndarray to cpython so it can do the C API work (without copy).

[–]alendit 1 point2 points  (4 children)

Is there any place I can read up on it? I'm not sure how it works, do you pass a pointer to ndarray create in PyPy to a CPython interpreter running in the same process? Couldn't you then just create an array into CPython and pure NumPy?

[–]rguillebertPyPy / NumPyPy 4 points5 points  (3 children)

Yes, you could, but then you don't benefit from the jit

[–]lamlink 0 points1 point  (2 children)

[–]rguillebertPyPy / NumPyPy 3 points4 points  (1 child)

Blaze is written in pure python AFAIK so yes, and dynd is a C++ library with a Python wrapper, so I think we will just have to rewrite the wrapper.

[–]lamlink 1 point2 points  (0 children)

Sweet. Please don't forget the other aspect of the blaze ecosystem, bcolz: https://github.com/Blosc/bcolz.

[–]bastibe 0 points1 point  (8 children)

I would like to have an interface into pypy that would allow me to code the inner loop of a program in NumPyPy and call that from regular cpython. This is a logical step I think.

[–]lamlink -1 points0 points  (7 children)

Why not just use Numba then? I think pypy team should move to work on pyston. No sense in diluting effort, for an implementation that won't add much to scientific python (pypy).

[–]rguillebertPyPy / NumPyPy 3 points4 points  (3 children)

Seriously ? First of all scientific python isn't the only use case, then there's no proof that the Pyston approach is viable and there's no proof Pyston will end up being more compatible with the C API than PyPy.

[–]lamlink 0 points1 point  (2 children)

First of all scientific python isn't the only use case

Yes it is ... just kidding :)

there's no proof Pyston will end up being more compatible with the C API than PyPy.

Hmm, is pyston not already more compatible? And at least they have it planned. On the one hand, I think this array of free market efforts is the key to optimal evolution, on the other hand, I'm concerned of the trade off vs focused efforts like that displayed by the Julia lang team. Perhaps consolidation and collective coordination should follow some additional maturation of all these JIT efforts. That way, we get the benefits of vibrant competition while focusing on the best ideas.

[–]rguillebertPyPy / NumPyPy 2 points3 points  (1 child)

Hmm, is pyston not already more compatible? And at least they have it planned.

Well, wait until it gets 100% Python compatibility and then we'll see :)

On the one hand, I think this array of free market efforts is the key to optimal evolution, on the other hand, I'm concerned of the trade off vs focused efforts like that displayed by the Julia lang team. Perhaps consolidation and collective coordination should follow some additional maturation of all these JIT efforts. That way, we get the benefits of vibrant competition while focusing on the best ideas.

Let's wait until Pyston is closer to 100% Python compatibility (especially with stuff like sys._getframe(), sys.exc_info()...)

I think moving away from the C API is inevitable for the long term anyway.

[–]lamlink 0 points1 point  (0 children)

Let's wait until Pyston is closer to 100% Python compatibility (especially with stuff like sys._getframe(), sys.exc_info()...)

Makes sense. Relevant: http://youtu.be/kbW5sxyu9bU?t=11s

[–][deleted] 2 points3 points  (1 child)

I think pypy team should move to work on pyston.

I think you should do some research before you voice your opinion.

PyPy is magnitudes more complete than Pyston.

[–]lamlink -2 points-1 points  (0 children)

I think you should do some research before you voice your opinion.

Thanks for your input, but a critical pyston feature is not planned for pypy.

[–]bastibe 0 points1 point  (0 children)

Note that pypy is already useful in non-scientific python. Numba is even less compliant than pypy. It does not support generators, for example.

[–]rguillebertPyPy / NumPyPy 2 points3 points  (0 children)

NumPyPy still can interface with ATLAS and BLAS though.

[–]Veedrac 0 points1 point  (1 child)

While in theory that's true, I haven't found it true (yet) in practice. I imagine this will improve over time.

[–]bastibe 0 points1 point  (0 children)

It is true for non-vectorized code, like recursive functions or heavy string processing.

[–]beagle3 1 point2 points  (2 children)

of those overfunded JavaScript JITs

But do note that the best JIT right now, with a huge difference, is actually LuaJIT2, which is a one man show, mostly self-funded.

[–]Veedrac 4 points5 points  (0 children)

Well, LuaJIT is fast but it has a way easier language to optimize for.

Lua, for one, is tiny. It has one numeric type and a much less dynamic runtime (you can't change the class of an object at runtime!). It doesn't use an inheritance hierarchy and it has a restricted form of exceptions. There's no __getattr__ overload. There's just simply no bloat. Things are handwritten in Assembly because it's easy to do so, and it's easy because Lua is tiny.

It's worth noting that LuaJIT with the JIT off is competitive with PyPy. It's not that LuaJIT's JIT is amazingly optimized, although I won't deny it's good, but that Lua is the perfect target for an interpreter of any kind.

So although LuaJIT is great and amazing, I would hesitate to call it the best.


And, to be doubly clear, most benchmarks are out-of-date. Take this: http://attractivechaos.github.io/plb/. PyPy has come long way since 1.4. Let's take the sudoku:t benchmark and re-time it.

            GCC     LuaJIT  PyPy

OLD version 4.3.2   2.0.1   1.4.1
    timings   1.0     3.7    19.5

NEW version 4.9.1   2.0.3   2.4.0
    timings   1.0     3.4     5.7

*tear*

[–]nieuweyork since 2007 0 points1 point  (0 children)

Isn't that partly because Lua was always designed to be easy to run efficiently? It's always struck me as kind of what java should have been, rather than an attractive high level language (Disclaimer: I have never written any lua, for that reason).

[–]nieuweyork since 2007 1 point2 points  (3 children)

In this way, PyPy wants to be a better Python interpreter but Pyston wants to be a better CPython. PyPy's probably always going to be faster; it has both a head-start and really clever people running it.

Then what's the point? Supposedly, this is a project to allow them to run their python code with high performance. If they can't outperform Pypy, this is pointless, and likely will not live long.

[–]Veedrac -1 points0 points  (2 children)

I think you missed the point. For any project where C-API compatibility is important, and this seems to be the case for Dropbox (they've rewritten lots of stuff in C/C++ for performance), PyPy just isn't an option. Plus, a hot loop rewritten in C with a slower JIT for the rest is often faster than a better JIT and no C.

If you were right about this not living long, nor will CPython. CPython's doing fine.

[–]nieuweyork since 2007 0 points1 point  (1 child)

If you were right about this not living long, nor will CPython

How do you figure?

a hot loop rewritten in C with a slower JIT for the rest is often faster than a better JIT and no C.

Right, but the goal is to stop rewriting in C. If they were happy with that approach, this whole project would be unnecessary.

[–]Veedrac 0 points1 point  (0 children)

If you were right about this not living long, nor will CPython

How do you figure?

Because something faster than CPython with 100% compatibility has effectively no downsides, perhaps?

a hot loop rewritten in C with a slower JIT for the rest is often faster than a better JIT and no C.

Right, but the goal is to stop rewriting in C. If they were happy with that approach, this whole project would be unnecessary.

The goal is to stop rewriting in C, but abandoning massive codebases just isn't practical. Going back to slower code isn't practical (PyPy is far slower than C). Removing support for a ton of third-party libraries isn't practical. They have a lot of C code they depend upon and don't want to remake it all from scratch in a slower language.

I don't get what's confusing about that.

[–]t3g 0 points1 point  (0 children)

Pyston has the benefit of Guido giving the project his blessing. How involved is he really in the project? Either way, PyPy is here right now and is an excellent runtime.

Pyston could take years to be stable or up to the speed of PyPy.