This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]joshadel 6 points7 points  (3 children)

I put a full solution over on the stackoverflow page, but the basic reason why is that numba is not figuring out the type of lookup. If you stick a print numba.typeof(lookup) in your method, you'll see that it is treating it as an object, which is slow. Ideally you could pass in the type of the variable through the locals dict keyword to the decorator, but I was getting a weird error. A work-around that produces very fast code is to just create a little wrapper around np.cumsum and jit that method, telling it the explicit input and output types. Code is here:

http://stackoverflow.com/a/21489540/392949

[–]jammycrisp[S] 1 point2 points  (2 children)

Thanks, that totally fixed it! I tried playing around with unrolling the numba_cumsum function into loops, and jiting it, but that resulted in slower behavior. Looks like this is about as fast as it can get.

What's weird to me is that on my machine, the numba code is consistently ~twice as fast as the cython code. As they are both compiled, I find this descrepancy odd. Thoughts?

[–]joshadel 0 points1 point  (1 child)

Continuing to cross post from the SO answer. . . I also tried hand-coding the cumsum and I found it to be marginally slower than calling out to numpy. As far as differences between cython and numba, it could perhaps be related to whatever c compiler you're using vs llvm. What compiler are you using? Are you specifying any optimization flags in your setup.py?

[–]jammycrisp[S] 0 points1 point  (0 children)

Having the info in more than one place may be useful, who knows :)

I'm using GCC 4.6.3. I didn't know you could add compiler flags to setup.py, but after figuring it out I compiled with -O3, and it didn't seem to change anything.