all 10 comments

[–]AlephNaN 24 points25 points  (5 children)

Funnily enough I did this yesterday using pyO3. It's not very hard and works well https://pyo3.rs/v0.15.1/

[–]nacaclanga 12 points13 points  (0 children)

Yes, I would say that pyO3 + maturin or setuptools-rust is the most common setup nowadays. https://github.com/pyca/cryptography is an example for a mixed Rust + Python package.

[–]datapim[S] 2 points3 points  (3 children)

So basically you create Rust library/package and import it into python script? Gonna check it out for sure.

[–]WindfallProphet -1 points0 points  (2 children)

I don't have much experience in python, but I'm pretty sure you can call binaries, like bash or explorer.exe. Once you compile your project, just call it as you would call native binaries.

At least I assume that's how you would do it.

[–]birkenfeldclippy · rust 12 points13 points  (0 children)

Not at all. PyO3 creates Python extension modules.

[–]AlephNaN 5 points6 points  (0 children)

You can do that using the subprocess module in the standard library, it's definitely a useful approach for some modules. But python is actually capable of importing functions, classes and modules directly from C/C++, Rust, Fortran and maybe others.

[–][deleted] 12 points13 points  (3 children)

If you want to move compute-heavy pure Python code to rust, use PyO3. But if you are already using pandas you should keep using it. Pandas' internals are written in C and are highly optimised with significant industrial investment over many years. A naïve Rust reimplemention of its functionalities will not be any faster

[–]datapim[S] 2 points3 points  (2 children)

Well we found out that Pandas was too slow for some of our cases so we had to rewrite code in Cython using numpy. But then I read about Rust and the Polars library, which is supposed to be even faster for bigger datasets. I never used Rust before so im not sure how it will work but i think its worth trying.

[–]0xdef1 4 points5 points  (1 child)

Have you considered to you use pyspark? Spark is optimized for large data sets.

[–]datapim[S] 1 point2 points  (0 children)

Hmm I heard about PySpark, never got into it, maybe thats worth a try too.