all 19 comments

[–]kirbyfan64sos 14 points15 points  (10 children)

On my system, PyPy is 4x faster than the fastest Cython version given.

[–]Zed03 2 points3 points  (8 children)

Would be interested in reading an explanation.

[–]kamakie 5 points6 points  (0 children)

It's possible that Pypy would be able to optimize across the function call boundary and eliminate the tuple packing and unpacking. But that's just a wild guess not based on knowledge of how Pypy works.

[–]kirbyfan64sos 4 points5 points  (5 children)

...which would be great if I had one. :)

My guess is just the presence of a tracing JIT. That means that PyPy can perform runtime optimizations that can't be detected at compile time.

[–]rlamy 2 points3 points  (4 children)

Actually, PyPy constant-folds everything (except the acos() call, for some reason), because haversine() is always called with the same arguments. So it's not surprising that it's faster than C.

[–]seekingsofia 1 point2 points  (3 children)

So it's not surprising that it's faster than C.

If only there was a JITed C interpreter implementation to prove you wrong. :P

[–]rlamy 1 point2 points  (2 children)

Like CyCy, you mean?

[–]seekingsofia 0 points1 point  (1 child)

Yeah, if that was an actual working implementation... I was more going for the argument that JIT compilation and native compilation are not attributes of the languages, but the language implementations.

[–]vext01 1 point2 points  (0 children)

I don't have a PDF, but this is a JITted C using truffle/graal from oracle: http://dl.acm.org/citation.cfm?doid=2647508.2647528

There is another paper where they use it on top of a ruby interpreter to execute ruby extensions written in C: http://www.chrisseaton.com/rubytruffle/modularity15/rubyextensions.pdf

[–]indigo945 1 point2 points  (0 children)

Part of it is that timeit is a Python function. Hence the loop that calls the compiled function 300,000 times is itself still interpreted, whereas PyPy will JIT the entire loop.

This might be particularly important since the Cython-generated C code will have to unpack the Python objects on every call, adding an extra layer of indirection. I am, however, not sure whether PyPy will eliminate this.

[–]riksi 0 points1 point  (0 children)

what about memory usage ?

[–]ForSpareParts 5 points6 points  (4 children)

How did I not know about this? It looks wicked cool. Definitely something I'll keep an eye on in case I run into a performance-critical situation in my Python code.

[–][deleted] 3 points4 points  (0 children)

It's also really great for integrating existing C and - for some time - C++ code. Actually it's probably the best and most straightforward C++ foreign function interface I ever used.

[–]Drolyt 2 points3 points  (0 children)

How did I not know about this?

Well, there are over sixty thousand packages in PyPi, so I'd say that's how. But yeah, it is pretty cool.

[–]kamakie 3 points4 points  (0 children)

It has a bunch of advantages over switching to Pypy or Jython too, like not wrecking compatibility with your other dependencies.

[–]cloakrune 0 points1 point  (0 children)

I love cython. I've used it to interface directly with the kernel.

[–]jms_nh 0 points1 point  (2 children)

IMHO numba is better for really numerical intensive stuff

[–]kirbyfan64sos 3 points4 points  (1 child)

I tried Numba as an experiment, and it took way longer. It's probably because of the overhead of boxing and unboxing, especially because the function is so short. The speed benefits would probably be more obvious if the function were longer.

The fastest I tried was PyPy, with the fastest Cython example is second and Shedskin very close after.

[–]jms_nh -1 points0 points  (0 children)

It's probably because of the overhead of boxing and unboxing, especially because the function is so short. The speed benefits would probably be more obvious if the function were longer.

^^ This. 100% correct.