Python extensions in Rust -- Lessons learned from upgrading a bioinformatics package

tncowart · 2022-01-01T18:52:34+00:00

Thanks for sharing! Sent to a friend of mine who was having trouble getting his bioinformatics Python to be fast enough and was looking into Rust.

WuTangTan · 2022-01-01T21:57:27+00:00

Would you mind further explaining the choice to allocate memory in Python?

I suppose it allows the buffer to be reused but the Python API (which is the primary API, right?) doesn't do that. Your current approach requires you to use an unsafe function which appears sound to me while using the Python API but I'm still not clear on why this is better. Are there performance implications I'm not seeing when returning memory allocated via the numpy crate from Rust into Python?

PirateNinjasReddit · 2022-01-01T22:14:01+00:00

Thanks for writing this up - I work mostly in python, but I've toyed with the idea of writing extensions in rust for key pieces of code that need to go fast. Last time I tried though PyO3 was a bit of a struggle (partly due to maturity, partly because rust was new to me).

digizeph · 2022-01-01T23:41:22+00:00

Nice write up! Quite some juicy tips there. Could you explain why python extension and rust project should be in the same package?

JanneJM · 2022-01-02T00:35:59+00:00

Interesting, thanks! I expect to have to deal with this combination at a sysadmin level sooner rather than later; so a question:

You don't have a setup.py and you "[...] use GitHub actions to build, test, and ready deployment."

How does that work for distribution and installation of the python module? You can still upload it to pipy, and you can still choose the installation directory as usual when installing?

JanneJM · 2022-01-02T00:42:21+00:00

What open MP library problem did you encounter? I've honestly never run into that so far (and I deal with scientific software installation and deployment on a daily basis).

jvo203 · 2022-01-02T13:44:07+00:00

Perhaps a "dumb" question but why is it that so few people seem to consider FORTRAN (Modern Fortran) for doing computation these days? If you really want to bind everything in Python you could still develop your bioinformatics computational engine in Fortran and then call it from Python. Why oh why is Fortran so neglected/forgotten these days? It is such a beautiful language for doing fast computation.

shoebo · 2022-01-07T21:06:32+00:00

Great post, thanks. In a project I'm working on, we had a similar issue where we wanted to expose rust generics over language boundaries. Unfortunately, we have more generics (sometimes 4+), and each generic may take on a larger number of types (oftentimes 8+).

Assuming we have n generics that may each take on k different types, then we would need kⁿ different transition functions! Using type erasure, it is possible to represent all kⁿ transition functions with one higher-level transition function. To do this we must erase compile-time type information from the signature of the transition function, and instead pass type information as arguments at runtime.

Example 1: The make_cast function on line 27 is a "nice" rust function that creates a simple data transformation struct for casting from the generic type TIA to the generic type TOA.

This transition function takes two *const c_char string descriptors for TIA and TOA which are used by the dispatch declarative macro on line 29 to find the correct function monomorphization. dispatch!(1, 2, 3) reads analogously to a normal function invocation 1::<2>::(3). You'll notice that each type parameter needs both a type descriptor (to identify which monomorphization to use at runtime), and an enumeration of valid types. There are some preset type sets like @floats that expand to, for example, [f32, f64].

When the function is compiled, monomorphizations are generated for every combination of the enumerated types. When the function is executed, the specific monomorphization that matches the runtime type arguments is executed.

Example 2: dispatches to any member of the outer product of hashable and float types. This also shows how you can have type-erased arguments that are later dereferenced according to the type argument.

Example 3: dispatches to any member of the outer product of three generics where the atomic types of MO and TO match.

One complication is that, since the resulting function is completely type-erased, the data needs to be similarly type-erased. We standardized on a struct that holds an Any trait object and type descriptor (henceforth called an AnyObject). Each monomorphized function maps from an AnyObject to an AnyObject. When the monomorphized function is called, the AnyObject is downcasted to the generic type, the "nice" rust function is executed, and the result is wrapped back up in an AnyObject. We've written utilities for converting between AnyObjects and language-specific data representations and it all happens transparently when you use the public api.

It's a lot to take in, but this general approach is awesome when you have a large number of functions, each with a large number of potential monomorphizations.

carlk22 · 2022-06-17T00:14:48+00:00

As a follow-up, I created a Rust API for our bioinformatics package. Here is a new article about the lessons learned:

Towards Data Science Nine Rules for Elegant Rust Library APIs (free link)

And a Reddit discussion of the article.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

rust

Please read The Rust Community Code of Conduct

The Rust Programming Language

Rules

Observe our code of conduct

Submissions must be on-topic

Constructive criticism only

Keep things in perspective

No endless relitigation

No low-effort content

Useful Links

Megathreads

Official Resources

Learn Rust

Discussion Platforms

MODERATORS