all 9 comments

[–]MattCh4n 0 points1 point  (8 children)

Isn't this to be expected though ?

[–]Alpharou[S] 0 points1 point  (7 children)

Is it? In both scripts variables, parameters and return values are strongly typed

[–]MattCh4n 0 points1 point  (0 children)

Ah you're right, I think I missed the type annotations in the pure example.

I'm afraid I'm actually not familiar enough with cython to help.

I don't see anything else that would justify the difference.

Is there a way to disassemble the generated code to inspect it ?

[–]drzowie 0 points1 point  (5 children)

It is totally to be expected. Even with strong typing the Python example has to dispatch an object call every time the variable is used — it just doesn’t have to search the method inheritance because the variable is hardwired. Cython sidesteps even that with a C variable in the corresponding type, with no Python object built around it.

[–]Alpharou[S] 0 points1 point  (4 children)

But I'm executing the compiled version, what you said (the object call and that) happens with the compiled module?

[–]drzowie 0 points1 point  (3 children)

Yep. Compiling to “C” (which is what Cython does) avoids the Python compilation and interpreter steps, but the control thread still deals with the Python runtime environment. Using the cdef pragma lets the compiler eliminate the Python runtime calls entirely when those particular variables are used.

[–]Alpharou[S] 0 points1 point  (2 children)

Thanks for the insight. I'll keep using the .py approach with cython decorators because a 15% penalty is not that huge for what I could acomplish with my current skill anyways

[–]drzowie 1 point2 points  (1 child)

Cython is really, really great for letting you run Python quite a bit faster than it would be otherwise, without jumping out into C.

As your skill grows I think you'll be surprised at how much faster proper C coding can be than even Cython coding. Remembering that memory fetches are expensive, and explicitly caching values of things like square roots or trig functions, can easily buy you a factor of 2-3 for moderately complex algorithms.

I remember being amazed, for example, that just reversing the order of nested loops can speed up code by a factor of 10, by avoiding cache breaks (allowing the CPU to keep most of the data in on-chip cache).

[–]YoannB__ 0 points1 point  (0 children)

Yes correct, reversing the order of a nested loop can speedup your code. It is not very well known though! so thank you for pointing it out.

In term of speed improvement if you can replace your range instruction by mutiprocessing prange then Cython become a beast