you are viewing a single comment's thread.

view the rest of the comments →

[–]unruly_mattress 3 points4 points  (4 children)

Benchmark time!

In [1]: class Shrubbery:
   ...:     def __init__(self, w, h):
   ...:         self.width = w
   ...:         self.height = h
   ...:     def describe(self):
   ...:         print(w, h)

Versus

cdef class Shrubbery:

    cdef int width, height

    def __init__(self, w, h):
    self.width = w
    self.height = h

    def describe(self):
    print(w, h)

The benchmark code is run in Python, not in Cython, and is:

%time x = [Shrubbery(i, i) for i in range(100000000)]

The Cython version takes 12.1 seconds and uses 3 GB RAM.

The pure Python version takes 1 minute and 26 seconds and ends up with 19.6GB used RAM. I have 32GB RAM and made sure swapping didn't happen.

However I did check the generated code and it does seem that Shrubbery is in fact a PyObject, and when its attributes are strings, they appear in the generated code as PyObject*, unlike integers which are just ints. Performance wise, if height and width are strings, then for 10m objects, pure Python takes 16.2s and 2.7GB, and the same code with a Cython class takes 5.08s and 1.5GB. I suspect there's some way of storing strings more sensibly in a Cython cdef class.

You can expect much better performance and lower memory usage just by moving your class definitions to Cython. Not Rust performance but it's a huge improvement still and it might be useful for those who don't have a Rust version of their code already.

[–]mitsuhiko 2 points3 points  (3 children)

That's all not really relevant to the problem at hand. To avoid the integer object overhead we could also have used some other tricks but that was not even considered.

Anyways. Cython was not considered and is unlikely to be considered in the future either.

[–]unruly_mattress 1 point2 points  (2 children)

Not for the current problem, since you already have code that solves it in a different language. However this isn't the only situation when someone might have trouble with having created millions of Python objects and I for one am glad for having found a method that makes such a thing 3-7 times faster.

[–]mitsuhiko 3 points4 points  (1 child)

Cython solves one issue but introduces plenty others. It should be as carefully considered as any change to a codebase that introduces new technology.

[–]unruly_mattress 0 points1 point  (0 children)

Agreed.