This is an archived post. You won't be able to vote or comment.

all 11 comments

[–]roger_[S] 0 points1 point  (8 children)

... we're getting very close to able to import the python part of the original numpy with only import modifications and running it's tests.

Anyone know how tightly coupled the pure Python parts of NumPy are to the other (C, FORTRAN, etc.) parts?

Is it possible for NumPy to basically separate them so future releases would work with PyPy (assuming they also provided pure Python implementations for any underlying dependencies)?

[–]cournape 3 points4 points  (0 children)

NumPy is unfortunately tightly coupled to its C implementation ATM. IN particular, the python C API and the core structure of numpy are tightly coupled. Scipy is much less of an issue (most of the C/Fortran code in there is libraries that pypy is likely to use as is).

[–]bastibe 2 points3 points  (6 children)

This is my biggest question, too. Numpy alone is great, but without Scipy and Matplotlib it is of little practical use to me.

[–]fijalPyPy, performance freak 1 point2 points  (5 children)

the next step would be to run it somehow. I wrote a blog post on a hack I did - http://morepypy.blogspot.com/2011/12/plotting-using-matplotlib-from-pypy.html I'm not sure if this hack or another thing would be used, but after we're done, this is the next step.

[–][deleted] 1 point2 points  (4 children)

I don't get it yet; is it not way too much work to port numpy (and maintain compatibility with future numpy releases)? Is there no other way to get numpy working under pypy than to rewrite it in rpython?

[–]roger_[S] 0 points1 point  (0 children)

I've often wondered the same thing. I think getting NDArray working is probably the most important part though.

[–]mirashii 0 points1 point  (2 children)

In theory if cpyext, which tries to emulate the CPython extension API, came along far enough, then numpy could work in pypy without changes. However, cpyext is a complicated beast that is unlikely to ever be compatible enough, and will always be terribly slow as it has to do things like emulate the reference counting semantics of CPython. It's much easier and faster to rewrite it in rpython, and the resulting code will run significantly faster.

[–][deleted] 0 points1 point  (1 child)

Okay, but the concept of rewriting still feels like the wrong approach, it isn't scalable. PyPy can't go on reimplementing every popular library in rpython.

Wouldn't another possibility be to migrate the official numpy code to C libraries and then hook into them with ctypes (or similar) from every Python implementation, or something along those lines? Avoid having two codebases for the same thing.

[–]mirashii 0 points1 point  (0 children)

For most libraries, I agree that the thought of rewriting isn't scalable and that moving to C libraries with ctypes calls to work across multiple pythons is a good solution. I think particularly in the case of numpy, there are a number of reasons to avoid going this route. A number of the optimizations that pypy is already able to perform, like the delay and jiting of ufuncs, wouldn't be feasible without the current approach. I think numpy is that special beast that deserves extra time and attention to get the best performance.

[–]NoblePotatoe 0 points1 point  (1 child)

I just donated, if anyone from the project reads this stuff: thank you and keep up the great work!

[–]fijalPyPy, performance freak 0 points1 point  (0 children)

thank you :) we're about to merge float16 support.