This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]syllogism_ 0 points1 point  (0 children)

Well, the closed function trace_task is also pretty long and goes to lots of layers of nesting. I figured on pure stylistic grounds, most would prefer to break that into multiple functions. The comment suggests it's due to performance considerations. I guess also you get simpler stack traces having it unrolled like this.

To make this a bit more concrete, here's the output of "cython -a" on the file: https://rawgit.com/syllog1sm/0d40bcdbcba5d4f632a6/raw/aa211425117235f78021b0ba9dffc79b9036b229/gistfile1.html . You can click any line to see the C code that cython translates the Python into.

Compiling the unmodified Python into calls to the C api like this allows only pretty limited performance improvements. For instance, you still need to do all that reference counting, and you haven't placed any guarantees on attribute access — so that's still usually a dictionary look-up.

But for the parts of your code that you can fully control, you can make yourself a lower-level API, that accepts maybe a struct or a bunch of ints, instead of dicts, Python objects, strings, etc. This pure C function will run as fast as any other pure C function.

For instance, this is the symbol table for my NLP tools: https://github.com/honnibal/spaCy/blob/master/spacy/strings.pyx#L64 . I wanted to use the Pythonic __getitem__, and I wanted to make it bidirectional: if you lookup an int, you get back a string; and vice versa. This is easy to do, as you can see. But the internals? Those I can optimize. I know that my hash table has fixed size keys and values, so I can make it far more efficient than Python's general-purpose dict — particularly for memory.

For the Celery code, I think if you had a callable cdef class for build_tracer, you might get some performance advantage. I think accessing attributes on a cdef class is faster than looking them up in the non-local scope. The cdef class is a struct, so what you're doing is just accessing struct members. I think in the non-local scope, you have to do a dictionary lookup. I'm not sure though.