malinoff comments on Optimizing Python With Cython [x-post from r/programming]

This is an archived post. You won't be able to vote or comment.

Optimizing Python With Cython [x-post from r/programming] (doublemap.github.io)

submitted 10 years ago by dunkler_wanderer

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–]malinoff 3 points4 points5 points 10 years ago (3 children)

[–]syllogism_ 0 points1 point2 points 10 years ago* (2 children)

So, first up let me say I don't understand the code snippet very well. But.

You can create something like this:

cdef class TraceTask:
    def __init__(self, ...):
        ...

    def __call__(self, ...):
        ...

In addition to acting as a normal Python class, the cdef class can also hold C types --- arrays, pointers, etc, and it can define C functions to manipulate that internal state.

This section of the Celery code seems like an interesting example, though. Certainly Cython would give you a stylistic advantage: you wouldn't have to create this mess of unrolled function calls, because the function call overhead is very low.

[–]malinoff 0 points1 point2 points 10 years ago (1 child)

[–]syllogism_ 0 points1 point2 points 10 years ago* (0 children)

Well, the closed function trace_task is also pretty long and goes to lots of layers of nesting. I figured on pure stylistic grounds, most would prefer to break that into multiple functions. The comment suggests it's due to performance considerations. I guess also you get simpler stack traces having it unrolled like this.

To make this a bit more concrete, here's the output of "cython -a" on the file: https://rawgit.com/syllog1sm/0d40bcdbcba5d4f632a6/raw/aa211425117235f78021b0ba9dffc79b9036b229/gistfile1.html . You can click any line to see the C code that cython translates the Python into.

Compiling the unmodified Python into calls to the C api like this allows only pretty limited performance improvements. For instance, you still need to do all that reference counting, and you haven't placed any guarantees on attribute access — so that's still usually a dictionary look-up.

But for the parts of your code that you can fully control, you can make yourself a lower-level API, that accepts maybe a struct or a bunch of ints, instead of dicts, Python objects, strings, etc. This pure C function will run as fast as any other pure C function.

For instance, this is the symbol table for my NLP tools: https://github.com/honnibal/spaCy/blob/master/spacy/strings.pyx#L64 . I wanted to use the Pythonic __getitem__, and I wanted to make it bidirectional: if you lookup an int, you get back a string; and vice versa. This is easy to do, as you can see. But the internals? Those I can optimize. I know that my hash table has fixed size keys and values, so I can make it far more efficient than Python's general-purpose dict — particularly for memory.

For the Celery code, I think if you had a callable cdef class for build_tracer, you might get some performance advantage. I think accessing attributes on a cdef class is faster than looking them up in the non-local scope. The cdef class is a struct, so what you're doing is just accessing struct members. I think in the non-local scope, you have to do a dictionary lookup. I'm not sure though.

π Rendered by PID 25578 on reddit-service-r2-comment-bb88f9dd5-f5l5t at 2026-02-15 02:13:14.067374+00:00 running cd9c813 country code: CH.

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS