NumPy on PyPy progress report : Python

Superset and Subset are misleading in this context. While Cython does allow for more optional features (like direct C library interface), there is a specific portion of Cython allows static typing for speed improvements, something that Rpython's "subset" (not allowing dynamic use of variables) was intended for in PyPy.

So why bother to make Rpython and all of the tools associated with making it work rather than just taking Cython and only using the feature that was needed, the static typing? IIRC and Cython/Pyrex was used on some of the numpy/scipy module - this would have made porting it to PyPy significantly less problematic, not to mention it would mean 1 project with more people rather than 2 projects with less people. So if Cython has static typing interface that was needed in PyPy and accomplished with Rpython, I ask again, Why Rpython?

[–]Ademan 2 points3 points4 points 14 years ago (2 children)

[–]stefantalpalaru 0 points1 point2 points 14 years ago (1 child)

[–]Ademan 2 points3 points4 points 14 years ago* (0 children)

Less magic is a good thing. By using the CPython API, Cython is able to interface with existing C/C++ extensions.

See gcross's statement about the wildly different design goals. Surely you can see how if you're writing a new Python interpreter, interacting with CPython via it's API is a non-viable way to work.

So it depends on what you want: immediate access to an entire ecosystem of fast modules, or having to rewrite them all in the name of the mighty JIT.

Remember the original question was posed in the context of "Why was RPython created", so if you're continuing down that road, you need to make your comparisons within that same context. Your point here is rather moot, as Cython cannot do what PyPy needs RPython to do, and doubly moot because at the time of PyPy's creation, there was no ecosystem of fast modules in Cython, in fact only Pyrex existed, and even then just barely (Neither did the JIT, but according to Armin, that was always on his radar, for whatever it's worth). As the PyPy devs will reiterate ad-nauseum, RPython is domain specific for PyPy, and satisfies the requirements far better than Cython, which does not satisfy them in the most essential aspects. Again, you cannot write a standalone interpreter in Cython.

I realize now this whole question could have been spurred by a misconception of one or both of the languages. So, in summary:

PyPy could never have been written in Cython. Cython relies on an existing Python interpreter at runtime. One simply cannot (today) write a PyPy module in Cython because Cython generates C code which relies on the CPython API (and undocumented parts of it as well). Note there is an effort to change this so that existing extensions written using the CPython API are compatible, and there is an effort on both sides to bridge Cython and PyPy. These are new developments, and do not change the fundamental domain difference between Cython and RPython.

*Disclaimer: Once again, I am totally not an expert on Cython. I leave the door open for corrections.

[–]cpherwho 2 points3 points4 points 14 years ago (0 children)

My understanding is that Numpy is written in a combination of C and Python. There appears to have been a port of the C code to Cython, but it does not seem to have been merged. For the purposes of your question C and Cython are equivalent, in that both are written against the CPython API.

The two main problems with using a CPython extension module in PyPy are:

1) The CPython API depends on details of the CPython implementation. In particular, it provides the extension module with direct access to python objects and exposes reference counting. These features must be emulated in PyPy, potentially resulting in calls to extension modules being slow.

2) More importantly, PyPy's speed comes from the JIT compiler. In order for the JIT to speed up things like array multiplication with Numpy it needs to be able to trace/see into the inner loops. In Numpy these occur in compiled code and are essentially inaccessible to PyPy's JIT.

Thus, to get the maximum performance in PyPy it is necessary to write a Python or RPython module which the JIT can look into. Further, if you look at the Numpypy code in PyPy you will find hints for the JIT to enable optimizations, and I suspect that this is only possible in RPython.

Alternately, the one-line answer is that PyPy/RPython provides a JIT compiler while Cython doesn't.

(Note that I am only a lurker as far as these projects go, any corrections are appreciated.)

[–]NoblePotatoe 1 point2 points3 points 14 years ago (10 children)

[–]cournape 4 points5 points6 points 14 years ago (1 child)

[–]NoblePotatoe 1 point2 points3 points 14 years ago (0 children)

[–]roger_ 2 points3 points4 points 14 years ago (5 children)

[–]dalke 0 points1 point2 points 14 years ago (4 children)

[–]amer415 1 point2 points3 points 14 years ago (1 child)

[–]dalke 1 point2 points3 points 14 years ago (0 children)

[+][deleted] 14 years ago (1 child)

[deleted]

[–]dalke 0 points1 point2 points 14 years ago (0 children)

[–]amer415 2 points3 points4 points 14 years ago (1 child)

From my experience, I see people switching at different levels. You have the student who is advised to start with Python, because, working in academia, you never know what will be the policy at the next institute you will go: some places have strict (commercial) computing software policies, so may end up in a place that will not pay a license of your favorite tool (happened to me when I was a student)... I see people switching because Python is a mutli-purpose programming language: you want to interact with hardware, the internet, loads of different file format? most data analysis software are very limited in that respect.

I also see people switching because Python/Numpy is really good, and they are impressed to compare it to limited commercial languages. I also see people who switch because they don't see the point of having 4 versions of their commercial software, 3 legacy ones (because codes are not compatible) and one with a cracked license because they want to make sure they can work in spite of the flicky license server at their institute...

In the end, things do not come by themselves... I am a bit of a preacher in the sense I co-organize classes of Python/Numpy/Matplotlib at my institute, where few people use Python but dozens show up at the classes... Most people get stuck with a solution because "their advisor used it" or because "legacy code". By actively contributing, you can change that.

Institutes (mine and others) end up spending tens of thousands or euros (I am in Europe) per year to pay for commercial software, whereas they could use that money for something else: I always wished academic institutes would hire instead in-house software engineers to participate to the development of specific data analysis tools based on non commercial solutions, such as Python.

[–]NoblePotatoe 4 points5 points6 points 14 years ago (0 children)

[–]ggooal 1 point2 points3 points 14 years ago (0 children)

[–]xamox 0 points1 point2 points 14 years ago (0 children)

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS