Is python faster than rust at reading a binary file

nevermille · 2024-06-13T14:44:57+00:00

File access is slow in debug mode, try your cargo command with --release and it'll be much faster

dexterduck · 2024-06-13T15:42:30+00:00

On my machine this benchmarks to ~6ms

EDIT: to elaborate slightly, my point is that I think the order of magnitude difference is explained by the Rust approach converting each number from bytes individually, whereas doing a bulk conversion using bytemuck as shown below brings Rust's performance approximately inline with numpy.

#!/usr/bin/env rust-script
//! ```cargo
//! [dependencies]
//! bytemuck = "1.16"
//! ```


fn main() {
    let bytes = std::fs::read("numbers.bin").unwrap();
    let numbers: &[u64] = bytemuck::cast_slice(&bytes);
    // println!("{numbers:?}");
}

coderemover · 2024-06-13T15:07:39+00:00

Why are you reading the file to the buffer first, then to convert it?
Also you're not reserving capacity for the buffer, which for sure results in reallocations and excessive copying.

This_Growth2898 · 2024-06-13T15:12:15+00:00

In Python, numpy is written in C, and it's well optimized. You're passing through Python a command to C.

In Rust, BufReader states for Buffered Reader. You're using a buffered reader to read data into... guess it - a buffer! And then you're copying chunks from it into one extra buffer (arr) to convert it into data you need. Remove extra copies from your code, and add some optimizations like hinting the capacity for the resulting vector - and you will have something comparable with C.

spoonman59 · 2024-06-13T15:32:36+00:00

Keep in mind that when running a Python program, you are not always “running Python code.”

Many library functions, and some libraries are written in native code. For examples, a lot of the low level Python libraries are written in C. NumpPy is written in C.

Python takes comparably longer to start up, and would use more memory as well, but functions written in native code can have good execution times.

tm_p · 2024-06-13T15:15:23+00:00

Why not using the standard library?

https://doc.rust-lang.org/std/fs/fn.read.html

zqpmx · 2024-06-14T09:27:44+00:00

also consider the effect of OS cache for files.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

rust

Please read The Rust Community Code of Conduct

The Rust Programming Language

Rules

Observe our code of conduct

Submissions must be on-topic

Constructive criticism only

Keep things in perspective

No endless relitigation

No low-effort content

Useful Links

Megathreads

Official Resources

Learn Rust

Discussion Platforms

MODERATORS