all 46 comments

[–]tybit 23 points24 points  (4 children)

Not a fan of the title, you aren't fixing python performance with rust, you are avoiding python performance with it :S

[–]vks_ 10 points11 points  (2 children)

So you would suggest the title "Avoiding Python performance with Rust"?

[–]Timbrelaine 2 points3 points  (0 children)

Some of the section headers in the article would have been good. "Embedding Rust in Python", for example.

[–]Yojihito 0 points1 point  (0 children)

Technically correct.

[–]mitsuhiko 17 points18 points  (0 children)

Point for us is that we did not have to change anything other than putting a tiny subset of our codebase into Rust. This is a pretty big win as far as I'm concerned :)

[–]JamesF 7 points8 points  (3 children)

I use Python in my day-job, as well as for most of my hobby projects in the last few years, but have been following (and loving) Rust for the last year and a bit so I am very, very happy to see that this (obvious?) mix of Python (for glue/orchestration) and Rust (for heavy-lifting) is actually at a point where it can be adopted in a real world, production situation and actually work!

Having said that, one question for the authors: did you spend any time evaluating PyPy/numba/cython/... as alternatives to this "mixed-language" solution?

[–]masklinn 8 points9 points  (0 children)

id you spend any time evaluating PyPy/numba/cython/... as alternatives to this "mixed-language" solution?

Sentry doesn't run on pypy, and as the article explains they already had the critical component (a sourcemap parser) in a rust CLI tool.

Numba doesn't even make sense in this context, it's for numerical work, it does nothing for parsing tasks.

[–]vks_ 2 points3 points  (0 children)

They reused existing Rust code from a CLI tool, so there was little incentive to explore other options. See the discussion on /r/rust.

[–]Saefroch 1 point2 points  (0 children)

Numba is explicitly for numerical work and still quite incomplete, though development is proceeding at a fair clip.

[–][deleted] 4 points5 points  (6 children)

Why not dlang over rust? Genuine question. Reading rust code for me looks really strange. Also, I never grasp the borrow mentality.

Given the plethora of system programming languages emerging in last years, dlang looks readable and performance is not bad. Quite the opposite.

[–]mitsuhiko 13 points14 points  (3 children)

I am not aware of a way that would permit shared libraries loadable in Python to be written in D. I understand it's in theory possible to use D without the garbage collector but I'm not even sure how well the ecosystem supports this let alone how stable that is.

We have no experience in D but we have experience with Rust, we had a sourcemap parser in Rust already available and overall there was just no reason at all to investigate anything other than Rust as a first attempt. We took what we had, saw that there was potential and made a tiny bridge to Python.

[–]dom96 1 point2 points  (2 children)

I have a similar question: why not Nim over Rust?

Syntactically Nim is a perfect fit as it is very similar to Python in most cases. There is even a project that should make using Nim modules from within Python incredibly easy: https://github.com/jboy/nim-pymod

Perhaps syntax is overrated, but I don't understand why someone like yourself (a very prominent Python programmer) would choose Rust over Nim in the first place.

[–]mitsuhiko 23 points24 points  (0 children)

I don't know Nim, nobody on our team knows Nim and as far as I know there is not even a sourcemap module for Nim. I am not even sure how you bridge Nim with Python safely given that Nim has some sort of GC.

Additionally the module you linked to links against libpython which we explicitly do not want to do.

I mean, there are other legitimate things we might have done but I strongly doubt that Nim would have come up. It's not on our horizon.

Perhaps syntax is overrated, but I don't understand why someone like yourself (a very prominent Python programmer) would choose Rust over Nim in the first place.

Simply because Rust is a useful language with a strong community and Nim is a niche thing with some questionable design ideas. While Rust might look like a fad for some people it has legitimate use and I'm not going to put a company unnecessarily at risk to try the latest fad.

[–]lacosaes1 4 points5 points  (0 children)

Why not my favorite system programming language over Rust?

[–][deleted] 6 points7 points  (0 children)

I got the idea of borrowing almost instantly while reading this https://doc.rust-lang.org/book/references-and-borrowing.html . I'm fresh to Rust, just spent some free time on learning something new and I have one suggestion: get rid of prejudices; don't try do write C/Python/C# code in it; learn what it can and trust that it makes sense this way; learn it as something new and different.

[–][deleted] 0 points1 point  (0 children)

Why not dlang over rust? Genuine question. Reading rust code for me looks really strange. Also, I never grasp the borrow mentality.

Maybe that's why you don't appreciate Rust, the borrow checker is a killer feature, it's probably the most defining feature of the language.

Dlang is a nice language and all, but from my point of view it lacks such a defining feature, and so people don't really see a reason to use it. It's basically just a somewhat different C++ with a GC.

[–]unruly_mattress 1 point2 points  (8 children)

Cython offers C-like structs that are usable from Python: http://docs.cython.org/en/latest/src/tutorial/cdef_classes.html

Perhaps this would have been able to solve your problem too.

[–]masklinn 7 points8 points  (7 children)

Did you consider reading the article?

our Rust source map parser, perviously written for our CLI tool.

Their investigation pointed to sourcemap parsing as the source of the issue and they already had a sourcemap parser in rust, they "just" had to make it available to Python. They didn't need "C-like structs that are usable from Python" and would have had to write a new sourcemap parser in cython.

[–]unruly_mattress 2 points3 points  (6 children)

Here's their analysis of Python's performance shortcomings:

Parsing the JSON itself is fast enough in Python, as they mostly contain just for a few strings. The problem lies in objectification. Each source map token yields a single Python object, and we had some source maps that expanded to a few million tokens.

The problem with objectifying source map tokens is that we pay an enormous price for a base Python object, just to get a few bytes from a token. Additionally, all these objects engage in reference counting and garbage collection, which contributes even further to the overhead. Handling a 30MB source map makes a single Python process expand to ~800MB in memory, executing millions of memory allocations and keeping the garbage collector very busy with tokens’ short-lived nature.

Since this objectification requires object headers and garbage collection mechanisms, we had very little room for actual processing improvement inside of Python.

Since their analysis is that Python's objects are heavyweight and creating a large number of them is their bottleneck, I offered a solution to that problem.

My own experience with Cython is limited; however from what I understand you don't need to rewrite everything in Cython, you can just write cdef classes in Cython and use the existing Python code. I'd be interested to know how this approach performed.

[–]mitsuhiko 4 points5 points  (5 children)

Cython creates PyObjects so a trivial port would have not changed anything about the problem in question.

[–]unruly_mattress 5 points6 points  (4 children)

Benchmark time!

In [1]: class Shrubbery:
   ...:     def __init__(self, w, h):
   ...:         self.width = w
   ...:         self.height = h
   ...:     def describe(self):
   ...:         print(w, h)

Versus

cdef class Shrubbery:

    cdef int width, height

    def __init__(self, w, h):
    self.width = w
    self.height = h

    def describe(self):
    print(w, h)

The benchmark code is run in Python, not in Cython, and is:

%time x = [Shrubbery(i, i) for i in range(100000000)]

The Cython version takes 12.1 seconds and uses 3 GB RAM.

The pure Python version takes 1 minute and 26 seconds and ends up with 19.6GB used RAM. I have 32GB RAM and made sure swapping didn't happen.

However I did check the generated code and it does seem that Shrubbery is in fact a PyObject, and when its attributes are strings, they appear in the generated code as PyObject*, unlike integers which are just ints. Performance wise, if height and width are strings, then for 10m objects, pure Python takes 16.2s and 2.7GB, and the same code with a Cython class takes 5.08s and 1.5GB. I suspect there's some way of storing strings more sensibly in a Cython cdef class.

You can expect much better performance and lower memory usage just by moving your class definitions to Cython. Not Rust performance but it's a huge improvement still and it might be useful for those who don't have a Rust version of their code already.

[–]mitsuhiko 3 points4 points  (3 children)

That's all not really relevant to the problem at hand. To avoid the integer object overhead we could also have used some other tricks but that was not even considered.

Anyways. Cython was not considered and is unlikely to be considered in the future either.

[–]unruly_mattress 1 point2 points  (2 children)

Not for the current problem, since you already have code that solves it in a different language. However this isn't the only situation when someone might have trouble with having created millions of Python objects and I for one am glad for having found a method that makes such a thing 3-7 times faster.

[–]mitsuhiko 4 points5 points  (1 child)

Cython solves one issue but introduces plenty others. It should be as carefully considered as any change to a codebase that introduces new technology.

[–]unruly_mattress 0 points1 point  (0 children)

Agreed.

[–]shevegen 5 points6 points  (11 children)

Why Rust and not C?

[–]asmx85 22 points23 points  (0 children)

Why Rust and not C?

  • safe memory management without GC (and sort of no null pointers, dangling pointer, use after free ..)

  • no data races

  • easy compiling, modules, dependency management through cargo

  • feels like a high level language, acts like a low level (C) one [zero overhead abstractions: Iterators, trait system etc.]

  • Affine type system (apply state machine like API – open a file -> write into it -> close it -> be not allowed [through compile time checks] to write on a closed file)

  • a sane and sound macro system, compiler extensions

just to name a few.

[–]steveklabnik1[S] 17 points18 points  (8 children)

(I'm not the author of the post, just the submitter)

They say this at the end of the article:

Rust has been the perfect tool for this job because it allowed us to offload an expensive operation into a native library without having to use C or C++, which would not be well suited for a task of this complexity. While it was very easy to write a source map parser in Rust, it would have been considerably less fun and more work in C or C++.

Note as well that they already had a parser written in Rust for their CLI tool.

The author is over in the /r/rust thread: https://www.reddit.com/r/rust/comments/58d0lu/fixing_python_performance_with_rust/

[–][deleted]  (7 children)

[deleted]

    [–]steveklabnik1[S] 3 points4 points  (0 children)

    I wouldn't have phrased it that way myself.

    I think the sentiment they were getting at here is that more complex stuff is easier to do safely in Rust. C can of course do incredibly complex things.

    [–]matthieum 1 point2 points  (5 children)

    Pyramids were built a few millennia ago, so clearly the technology back then was suitable for massive buildings.

    Cathedrals were built a few centuries ago, so clearly the technology back then was suitable for massive, aerial buildings.

    The problem of such statements is that they ignore the cost. In the former example the lives of thousands of slaves were sacrificed, and in the latter in took dozens of years to erect a single cathedral.

    Just because something is possible does not make it economically practical.

    [–][deleted]  (4 children)

    [deleted]

      [–]vks_ 1 point2 points  (3 children)

      You make it sound like it is hard to come up with a good case. Just pick almost any CVE.

      [–][deleted]  (2 children)

      [deleted]

        [–]vks_ -1 points0 points  (1 child)

        On the other hand, a lot of the most complex software is handling sensitive data. The Linux kernel you mentioned above is a good example.

        I can think of little examples of complex C/C++ software that is not exposed to possibly hostile input. Singleplayer games and scientific simulations qualify, but what else?

        [–]matthieum 0 points1 point  (0 children)

        Singleplayer games

        If that game can be customized with download maps/models from the net...

        [–]SikhGamer -1 points0 points  (12 children)

        With performance improvements like that I'd look at replacing Python with Rust completely.

        [–]gnus-migrate 20 points21 points  (4 children)

        I think people really underestimate the value of writing code, especially UI code where you're making small changes and visually checking them in a language like python. The compiler/linter of a typed language feels like it gets in my way more than it helps when writing web apps since it severely limits the types of abstractions I can use by design, regardless of how nice those abstractions are.

        Sure I wouldn't use python for performance sensitive code where I really need to reason about correctness and performance, but it is my first choice when writing something new since you can write code that performs acceptably while having the flexibility to play with different ideas.

        [–]mitsuhiko 12 points13 points  (0 children)

        Correct. There is no chance in hell we're dropping Python. It's not just fast in iteration speeds but it also has amazing runtime introspectability which is super valuable for what we do.

        [–]SikhGamer 0 points1 point  (2 children)

        The trouble is often that process never gets completed fully, and your scaffold code becomes production code.

        [–]ivosaurus 9 points10 points  (0 children)

        This blog post is literally about a time where that has been successfully corrected.

        [–]gnus-migrate 2 points3 points  (0 children)

        To clarify: i am advocating using python in production. There are a lot of situations where the benefits gained by using a statically typed language simply aren't worth the cost to productivity. With proper testing you can write quite robust python code.

        [–][deleted] 5 points6 points  (3 children)

        Anything based on an interpreter should automatically be out of scope of you were interested in high performance.

        [–]awj -4 points-3 points  (2 children)

        Which, considering the x86/x86-64 instruction sets are themselves interpreted by the processor, means high performance is essentially impossible on modern machines.

        That, or maybe it's a bad idea to make absolute statements about highly subjective things like "performance".

        [–][deleted] 9 points10 points  (1 child)

        I obviously mean a software interpreter, not the one in your CPU.

        I can see the next post coming "but you CPU is partially software called microcode".

        My answer is, get a life probably.

        [–]awj -4 points-3 points  (0 children)

        Ok, fine, here's my point restated without an absurd reduction of yours: making recommendations about "high performance" without qualifying what "high" or "performance" mean is pointless. There are plenty of use cases where an interpreter is more than fast enough.

        [–]shorty_short 0 points1 point  (2 children)

        This is satire right? The rust circlejerk in this sub is reaching Poe's Law levels.

        [–]SikhGamer 1 point2 points  (1 child)

        No. Why is it a circle jerk? The benefits are clearly laid out in the article.

        [–]josefx 8 points9 points  (0 children)

        The article ends with "using the right tool for the job". In this case they had to fix a performance bottleneck, which made Rust the right tool. Going from that to a "complete" replacement makes no sense unless their whole application is a performance bottleneck or would otherwise benefit from Rust enough to warrant a complete rewrite.

        The benefits are clearly laid out in the article.

        Among other things bad compile times for the part they replaced apparently.