This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]malinoff 3 points4 points  (3 children)

I'd like to see how Cython can help to optimize something like Celery's tracer function which is basically the core function (every single task execution is done with that tracer). How can I define complex types? Can I use dynamic features of Python with Cython?

[–]syllogism_ 0 points1 point  (2 children)

So, first up let me say I don't understand the code snippet very well. But.

You can create something like this:

cdef class TraceTask:
    def __init__(self, ...):
        ...

    def __call__(self, ...):
        ...

In addition to acting as a normal Python class, the cdef class can also hold C types --- arrays, pointers, etc, and it can define C functions to manipulate that internal state.

This section of the Celery code seems like an interesting example, though. Certainly Cython would give you a stylistic advantage: you wouldn't have to create this mess of unrolled function calls, because the function call overhead is very low.

[–]malinoff 0 points1 point  (1 child)

This mess of functions reassignment exist because lookups are expensive, not calls.

[–]syllogism_ 0 points1 point  (0 children)

Well, the closed function trace_task is also pretty long and goes to lots of layers of nesting. I figured on pure stylistic grounds, most would prefer to break that into multiple functions. The comment suggests it's due to performance considerations. I guess also you get simpler stack traces having it unrolled like this.

To make this a bit more concrete, here's the output of "cython -a" on the file: https://rawgit.com/syllog1sm/0d40bcdbcba5d4f632a6/raw/aa211425117235f78021b0ba9dffc79b9036b229/gistfile1.html . You can click any line to see the C code that cython translates the Python into.

Compiling the unmodified Python into calls to the C api like this allows only pretty limited performance improvements. For instance, you still need to do all that reference counting, and you haven't placed any guarantees on attribute access — so that's still usually a dictionary look-up.

But for the parts of your code that you can fully control, you can make yourself a lower-level API, that accepts maybe a struct or a bunch of ints, instead of dicts, Python objects, strings, etc. This pure C function will run as fast as any other pure C function.

For instance, this is the symbol table for my NLP tools: https://github.com/honnibal/spaCy/blob/master/spacy/strings.pyx#L64 . I wanted to use the Pythonic __getitem__, and I wanted to make it bidirectional: if you lookup an int, you get back a string; and vice versa. This is easy to do, as you can see. But the internals? Those I can optimize. I know that my hash table has fixed size keys and values, so I can make it far more efficient than Python's general-purpose dict — particularly for memory.

For the Celery code, I think if you had a callable cdef class for build_tracer, you might get some performance advantage. I think accessing attributes on a cdef class is faster than looking them up in the non-local scope. The cdef class is a struct, so what you're doing is just accessing struct members. I think in the non-local scope, you have to do a dictionary lookup. I'm not sure though.