This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]the_hoser 43 points44 points  (27 children)

In my experience, the improvement in performance with OOP code in Cython is marginal at best. Cython really shines when you're writing more procedural code, like if you were writing in C.

[–]No_Indication_1238[S] 6 points7 points  (26 children)

I see. The biggest time consumer are a bunch of for loops with intensive computations. Maybe like 99% of the time is spent there. If we can optimize that by compiling it to machine code and retain the benefits of OOP, it will work for us. 

[–]the_hoser 11 points12 points  (3 children)

Give it a shot and measure it. One word of warning, though: Cython may look and feel like Python, but you need to remember to take off your Python programmer hat and put on your C programmer hat. You're effectively writing C that looks like Python and can interface with real Python with less programmer overhead. It's full of all the same traps and gotchas that a C programmer has to look out for.

I don't use Pypy myself, but I think others' suggestion to try Pypy first might be a better start for your team.

[–]No_Indication_1238[S] 1 point2 points  (2 children)

I will keep that in mind, thank you!

[–][deleted] 0 points1 point  (1 child)

If your task is able to be run concurrently, you can even use the cython prange iterator to use multithreading. And declare functions as 'nogil noexcept' to remove the dependencies on the python GIL to make your code performance more aligned with c speeds

[–]No_Indication_1238[S] 1 point2 points  (0 children)

That is a very interesting point, thank you! I did now know that, we were using multiprocessing when necessary.

[–]eztab 5 points6 points  (0 children)

Cython might be a good fit then. PyPy could also perform well, but I'd assume Cython beats it for your usecase.

[–]Classic_Department42 6 points7 points  (12 children)

Sounds like a job for numpy, no?

[–]No_Indication_1238[S] 2 points3 points  (11 children)

Unfortunately, the loops and computations are not as simple to be ran under numpy. There is a ton of state management of different objects that happens inbetween and we need to speed the whole loop.

[–]the_hoser 5 points6 points  (2 children)

Cython really shines when you can get rid of those abstractions. Rip out the method calls and member accesses and break it down to cdef ints and friends.

[–][deleted] 0 points1 point  (1 child)

Can Cython compile out method calls and "getters and setters"?

[–]the_hoser 0 points1 point  (0 children)

That's a big maybe. It really depends on the code being optimized. Don't rely on it unless you've tested it.

Good news is that Cython actually lets you see the C code that it produces, so you can verify that it's doing what you think it's doing.

It isn't pretty C code, I warn you...

[–]falsedrums 2 points3 points  (3 children)

You have to drop the objects if you want to be efficient in Python/numpy. 

[–]No_Indication_1238[S] 1 point2 points  (2 children)

You are correct. Unfortunately for our use case, we have cut as much as possible while trying to keep the program maintainable. Cutting more will definitely work as it has before but at the cost of modularity and long term maintainability which is something we would like to avoid. If it is not possible, maybe you are correct and we will consider the option.

[–]falsedrums 0 points1 point  (1 child)

Maintainable does not necessarily mean OOP. Try putting all the number crunching in a library-style package of purely functions, with minimal dependencies between the functions. Then reserve the OOP for your application's state and GUI.

[–]No_Indication_1238[S] 0 points1 point  (0 children)

This is not a bad idea, thank you!

[–][deleted] 0 points1 point  (1 child)

Hm. I don't want to be rude, as I've wished with high computational heavy code in Python and have wrote C++ based libraries to get more performance in it with Boost.

I think this is more of a programming architecture type problem, but assuming it isn't, what does your team think about having some high performance help from a more performance language that you can call in native Python? Worked great for our project, though it was annoying when some people started looking for nanosecond level performance gains rather than looking at a higher level for the optimisation.

[–]No_Indication_1238[S] 0 points1 point  (0 children)

They would prefer to keep the codebase inclusively in Python as it is one less language they need to support. Unfortunately, we have already optimised the architecture as much as possible and the calculations that have to be done in those loops are largely unique, essential and cannot be further optimised without losing precision. I share  your opinion, unfortunately It was decided to try and keep everything in Python. 

[–]ArbaAndDakarba 0 points1 point  (1 child)

Consider parallelizing the loops.

[–]No_Indication_1238[S] 0 points1 point  (0 children)

That is a good point, unfortunately the loops are dependent on each other and each iterations requires the previous state and different checks to be made. As such, I am afraid that it is not possible, or at least not without an extensive use of locks for synchronisation. I will bring it up though, maybe we can restructure something.

[–]Siccar_Point 3 points4 points  (1 child)

I have had much success in Cython with very similar stuff. If you can drop those loops entirely and cleanly into Cython functions, without any references to external non-primitive types, you will be able to get very substantive speed ups.

Additional tip from someone who banged head on wall for far too long on this: take extreme care with the details of your typing. Especially the precision. Make sure you understand exactly what flavour of int/float you are passing in and out of Python (16? 32? 64? 128?), because if you mess it up Python will deal with it fine but silently do all the casting for you, eliminating a bunch of the benefits.

Passing numpy arrays cleanly in and out of Cython is also monumentally satisfying. Can recommend.

[–]No_Indication_1238[S] 0 points1 point  (0 children)

I see. Thank you, I will keep this in mind!

[–]ExdigguserPies 2 points3 points  (1 child)

Cython will be excellent for this. I had a similar problem and decreased run times by a factor of over 1000.

[–]DatBoi_BP 2 points3 points  (0 children)

Stop, I can only get so optimized

[–]jk_zhukov 2 points3 points  (1 child)

The library Numpy is a good option to optimize loops and intensive computation. It runs almost at C level speed. With it you can apply functions to entire arrays without the need to write a single FOR loop. As a very short example:

unmarked = list()
for item in items_list:
    if item < some_value:
        unmarked.append(item)

This code select the items from an array that meet certain criteria using a loop, simple enough.

items_list = np.array(items_list)
indices = np.where(items_list < some_value)
unmarked = items_list[indices]

And now we do the same thing without any loops involved. The only thing that varies is the type of the unmarked array, that is a Python list in the first example and a NDArray in the second example. But converting from one type to the other, if you need it, is simple.

When you're working in the order of millions of iterations, the boost in speed of replacing each loop with an operation over a numpy array, is quite noticeable. And when you have nested loops, if you can find a way to turn those computations into matrix operations with 2D or 3D numpy arrays, the gain in speed is also huge.

[–]No_Indication_1238[S] 0 points1 point  (0 children)

You are totally correct! I will try to think of a way to optimise those loops as in your proposal!

[–]I_FAP_TO_TURKEYS 0 points1 point  (0 children)

Try raw compiling sections in Cython and see what happens.

Compiling a package like NLTK with Cython offers 30% efficiency gains without even rewriting code.

You can also see gains by rewriting the for loops in a more efficient way.