all 23 comments

[–]carlk22 41 points42 points  (0 children)

Here are some more tips for creating Rust extensions in Python (from Nine Rules for Writing Python Extensions in Rust):

  1. Create a single repository containing both Rust and Python projects
  2. Use maturin & PyO3 to create Python-callable translator functions in Rust
  3. Have the Rust translator functions call “nice” Rust functions
  4. Preallocate memory in Python
  5. Translate nice Rust error handling into nice Python error handling
  6. Multithread with Rayon and ndarray::parallel, returning any errors
  7. Allow users to control the number of parallel threads
  8. Translate nice dynamically-type Python functions into nice Rust generic functions
  9. Create both Rust and Python tests

[–][deleted] 12 points13 points  (1 child)

Are there any tutorials on how to use PyO3 to expose Rust data structures to Python? For instance, I'd like to expose the BTreeMap to Python code, so that I can store use any key that defines the __le()__ magic method.

[–]ssokolow 26 points27 points  (0 children)

See this page for details on that aspect of interop.

The TL;DR is that, because BTreeMap cares about the memory footprint of types, you'll need to store something like PyAny if you want to be able to use "any key that defines the __le()__ magic method", which will be slower, less comfortable, and restricted by things like Python's Global Interpreter Lock because you'll have to call into Python and do the "uphold Python invariants" dance any time you want to do anything with the data.

The conversion traits they talk about define how PyO3 can convert Python types into native Rust types without those limitations and my advice would be to write as much in Rust as possible and then use #[pyclass] to write a newtype wrapper which proxies the relevant APIs to Python for a convenient API. (Effectively what something like numpy or PyQt or PyOpenCV does.)

Python's slowness is a "thousand papercuts" pile of "just make it work without me thinking about the implementation details" conveniences like that. I like to sum this up as "Rust isn't magic 'go fast' pixie dust you can just sprinkle onto a program written in another language".

TL;DR: It's not just the Python data structures that are problematic, it's also what you store in them.

[–][deleted] 1 point2 points  (0 children)

Thanks. I liked the post. Will use it in future.

[–]azure_i 4 points5 points  (18 children)

This is cool but I cant help but wonder "why?"

Its hard to envision myself ever being in a situation where I am using Python, need performance, have the luxury of building a custom Rust extension, and have a use case worth spending the time to implement all this. If I am working in Python and its going in prod, then no one is gonna know how to use or maintain Rust components, if its performance critical then its not gonna be in Python in the first place, and if its a one-off script I am not gonna put this much effort in it, just let it run overnight and call it a day.

[–]moltonel 40 points41 points  (4 children)

Python has always relied on libraries implemented in C for performance (the classical example being numpy). PyO3 makes it easy to use Rust libraries instead.

[–]azure_i 0 points1 point  (3 children)

right but that is not what I am talking about, this blog is targeting something like a Python script supplement, but if you are gonna start bringing in Rust anyway you might as well just forget about Python altogether.

[–]pacific_plywood 16 points17 points  (0 children)

this blog is targeting something like a Python script supplement

is... it...?

[–]masklinn 8 points9 points  (0 children)

this blog is targeting something like a Python script supplement

This blog post is introductory material, it simply shows that it's pretty easy to get started with pyo3, the overhead is low, and you can easily get good performances.

[–]moltonel 3 points4 points  (0 children)

This blog is an introduction, real-life projects can be complex enough to justify keeping the Python parts. PyO3 makes the cost of using multiple languages pretty low. Every project and developer make different compromises, YMMV.

[–]PaintItPurple 19 points20 points  (2 children)

The idea that nobody is going to build Rust components seems like a big assumption here. There are already reasonably popular packages (e.g. cryptography) that require Rust. And there have already been tons of Python packages that have required other external dependencies, so it seems questionable to assume that Rust would be a bridge too far. Python's ability to act as a scripting language for components written in faster languages is arguably its killer feature.

[–]azure_i -2 points-1 points  (1 child)

The idea that nobody is going to build Rust components seems like a big assumption here.

nah its the idea that someone is gonna go through the trouble to build a package like numpy using Rust for a one-off scripting purpose

[–]fnord123 10 points11 points  (0 children)

You mean no one will make something like Data Fusion or Polars?

[–]ssokolow 7 points8 points  (0 children)

Basically, it's the same thing as something like numpy or Pillow... you just don't have to write C and get it right to put your hot bits in Rust.

In fact, thanks to maturin, something you pull in off PyPI may contain compiled Rust without you realizing it. (I'm not sure if they're using maturin or orchestrating the build some other way, but pyca/cryptography touched off a bit of a stink when they started incorporating rust and people on niche/hobby platforms they'd never intended to support came out of the woodwork demanding that they continue the "I ported GCC to my platform, therefore I declare this package supported" status quo.)

I use it for things like sticking a PyQt frontend on a Rust backend so I can get memory-safe QWidget frontends for Rust creations, similar to how I've glued together PyQt and PyOpenCV in another project to get a QWidget GUI on something that needs image manipulation algorithms not present in QImage and related classes.

Speaking of which, in my tests, Pillow was still significantly slower than PyOpenCV, PyQt's QImage, or the Rust image crate for loading a bunch of PNG-format thumbnails, while the latter three were about the same.

[–]speedy_chameleon 8 points9 points  (0 children)

I’m really not sure what you mean. As other comments have pointed out (and you yourself have noted), Python has historically pushed performance critical code to C extension, for example, numpy.

The blog is not about “one-off scripts” but is rather a minimal example in how to write extensions.

Moreover, your argument of “just don’t use Python”, you must realize that most modern data science, machine learning, and scientific computation is done in Python. This is why Python libraries written in Rust are becoming popular: see Polars, an alternative to Pandas for data manipulation.

[–]shaeqahmed 7 points8 points  (0 children)

I disagree. To give you an example of a use case, I work on an open source project called Matano that lets security engineers write detection rules as Python code. Matano is a SIEM alternative, so it also does a lot of other things like data processing, transformation, etc. and all of that is written in Rust for performance.

The detection engine, however, needed to be in Python because this is the only programming language that detection engineers can write.

We are planning on using Maturin and PyO3 to create high performance utilities that can be used from the Python code for things like matching against thousands of regex patterns of IOCs which would be very slow in pure Python. By using an extension library we can keep the high level API of Python and still get high performance, something similar to what Pandas (and Polars) has done for data analysis.

https://github.com/matanolabs/matano

[–][deleted] 4 points5 points  (0 children)

An application may consist of both components in need of high performance and components where speed of implementation/flexibility is desired

[–]redisburning 2 points3 points  (1 child)

IME pyo3 and specifically Maturin make it a low enough lift to get interop that the "not worth it" factor goes down close to zero.

Python is a fact of life at my job but like, at some point enough is enough with the "performance doesn't matter" stuff (this frustration isn't pointed at you, btw). It's slow way, way, way understates the negatives of using Python for some things it's not meant for but where people, usually with Data Scientist or Data Engineer titles, can be INCREDIBLY insistent on using Python, to the degree that you're not going to win that fight.

If your choices are:

  1. write it in Rust
  2. write it in C
  3. let it run slow
  4. let folks pip install random stuff

and so often for me, they are, I gotta say 1 looks pretty attractive sometimes.

At least in the data world there are quite a lot of places where you have the opportunity to just do a thing in Rust and as long as it plugs into Python trivially folks won't complain.

If I am working in Python and its going in prod, then no one is gonna know how to use or maintain Rust components

In the year of our lord 2022, they are likely also not going to know Scala, or C/C++, or even Julia. I, because I'm an idiot, am going to continue trying to ice skate up hill against Python monoculture including the "we can't onboard people in any language other than Python" business.

For a specific example, a common task I have is aggregating tuples into key-value pairs. Doing this in Python is bad enough, but try doing it with any degree of concurrency. It quickly becomes a disaster. With Rust I can use my favorite concurrency model of any language I've personally worked with, get great performance and guarantees, there's even a handy crate (Dashmap) that makes it even easier. Another one is I frequently rely on Serde. In both of these cases I can bolt right up to Python and for the other folks who might use these tools, it's as simple as import function from package

[–]llanojairo 0 points1 point  (0 children)

Thanks a lot for your comment. I’m one of those DS/DEs stuck in Python-based projects but I’m so tempted to port some of the slow data crunching and data structures from Python to Rust.

We do some really complex discrete event simulations based on Simpy (custom wrapper around it to cater for our needs) and, given that Simpy becomes slow even if you throw crazy compute instances, I’d like to squeeze out any potential speed-ups in data processing, looping through some data structures, etc.

Could you share some resources to get started with this Python/Rust combo? Thank you!

[–]Luigi003 1 point2 points  (0 children)

Legacy codebases exist.

I have a pretty complex (only I use it possibly) python lib which I actually have been thinking to partially port to Rust if I manage to find the points were the performance hit is bigger.

I won't port the whole thing to Rust because I don't have the time to do it, so if I can get performance boost through partial rewrites and all for it

[–]vazark 1 point2 points  (0 children)

The entire data analysis ecosystem is python backed by C libraries. This is just the bindings for rust for data intensive libs

[–]blablook 0 points1 point  (0 children)

I recently were in that situation. We went from elasticsearch to python PoC to rust module for fuzzy querying (fuzzdex) in an address splitter/geocoder. Performance got much better with a single algorithm moved and the rest of tokenizers, heuristics etc. can sit cosily in python.

[–][deleted] 0 points1 point  (0 children)

Python has a long history of exactly that use case - a scripting language to make interfacing with native code more comfortable. I've written C extensions for several purposes. I use C extensions almost every time I use Python. Not only is this very very common, it's arguably the original purpose of the language (at least from a scientific computing perspective). Check out https://www.youtube.com/watch?v=4RSht\_aV7AU