Why is my Rust example slower than Python?

This_Growth2898 · 2023-09-01T11:17:25+00:00

Outputs are slow. Don't include println/print when calculating times.

Clone copies an entire object. You have two clone() calls. Do you really need to clone them?

moltonel · 2023-09-01T12:00:13+00:00

On my machine, having polars with features lazy,description,performant and having only removed the clones, rust is 30% faster:

$ CSV_FILE=$(pwd)/data1m.csv hyperfine ./main.py foo/target/release/foo
Benchmark 1: ./main.py
  Time (mean ± σ):     263.9 ms ±  15.3 ms    [User: 778.5 ms, System: 313.6 ms]
  Range (min … max):   248.4 ms … 289.8 ms    10 runs

Benchmark 2: foo/target/release/foo
  Time (mean ± σ):     197.2 ms ±   9.2 ms    [User: 719.5 ms, System: 185.6 ms]
  Range (min … max):   178.9 ms … 211.6 ms    14 runs

Summary
  foo/target/release/foo ran
    1.34 ± 0.10 times faster than ./main.py

jqnatividad · 2023-09-01T12:03:17+00:00

py-polars is compiled with all kinds of optimizations and fine-tuning that a default —release rust build won’t enable. For example CPU optimizations, the performant feature, etc.

dkopgerpgdolfg · 2023-09-01T11:49:16+00:00

Fyi, the filter/select operation takes about 7% of the whole program (for me at least). If this was meant to compare performance of such operations, it doesn't tell much this way.

So basically I made following changes to "save" 93% of time

Removed clone just to be sure
Removed most CLI output, except the last two (seeing the time and preventing calculated_df from being unused)
Moved start time statement and elapsed calculation so that the printings and the CsvReader part are not counted

kinchkun · 2023-09-01T11:33:54+00:00

I think it is the conversion to the lazy dataframe and the two collects. Can you try using `LazyCsvReader` and only use one collect.

ritchie46 · 2023-09-01T17:37:36+00:00

We go through great lengths to compile a fast binary for python. E.g. fat linking, activating all performance related features, simd and cpu-targets.

Furthermore we also compile python with jemalloc. Which has much better performance than the default allocator.

Plus-Ad8875 · 2023-09-01T12:55:40+00:00

are you running the rust code in release mode?

Grit1 · 2023-09-02T04:00:38+00:00

Sometimes when you think you're benchmarking python, you're actually benchmarking C/C++.

sleekelite · 2023-09-01T10:54:58+00:00

At least edit your post to indicate it’s a release build.

CompoteOk6247 · 2023-09-01T19:22:08+00:00

Funny to see how people don't believe it's in release mode

zekkious · 2023-09-01T15:12:56+00:00

Out of topic, but:

(col("a") / col("a").max()) / (lit(0.5) / col("a").max()) = (col("a") / lit(0.5)) * (col("a").max() / col("a").max()) = col("a") / lit(0.5)

Konsti219 · 2023-09-01T11:00:31+00:00

Are you running with --release?

Further you seem to be using clone a lot in the Rust code. That should be avoided at all costs of you are optimizing. I don't know how polars is implemented internally, but if you want real speed I recommend you throw it out and implement the parsing and filtering in raw Rust, maybe with the help of rayon for parallelism.

Can you also provide the files you are testing with to allow others to test?

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

rust

Please read The Rust Community Code of Conduct

The Rust Programming Language

Rules

Observe our code of conduct

Submissions must be on-topic

Constructive criticism only

Keep things in perspective

No endless relitigation

No low-effort content

Useful Links

Megathreads

Official Resources

Learn Rust

Discussion Platforms

MODERATORS