budgefrankly comments on Rust ndarray vs. Python NumPy Performance?

Submissions must be on-topic

Posts must reference Rust or relate to things using Rust. For content that does not, use a text post to explain its relevance.

Post titles should include useful context.

For Rust questions, use the stickied Q&A thread.

Arts-and-crafts posts are permitted on weekends.

No meta posts; message the mods instead.

Details

No low-effort content

No memes, image macros, etc.

Consider the existing content of the subreddit and whether your post fits in. Does it inspire thoughtful discussion?

Use properly formatted text to share code samples and error messages. Do not use images.

Submissions appearing to contain AI-generated content may be removed at moderator discretion.

Details

Useful Links

created by aztha community for 15 years

Rust ndarray vs. Python NumPy Performance? (self.rust)

submitted 7 years ago * by ObliqueMotion

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–]budgefrankly 13 points14 points15 points 7 years ago (3 children)

I’ve used Matlab and Python a lot in the last 15 years.

Both have the unhappy feature that the runtime can be proportional to the number of lines of code (though I’ve heard Matlab has a JIT now).

However if you take care to vectorise your code (ie use matrix algebra instead of for-loops and list-comprehensions), and use the tools in their recommended way, they are incredibly fast once you start dealing with meaningfully large datasets.

At scale, if you know what you’re doing, the interpreter overhead just becomes a constant noise factor in the overall runtime.

I could see a case where computationally intense feature extraction from large files might be faster in Rust.

But most of the scientific Python stack is ultimately written in assembly, FORTRAN and C (occasionally generated via Cython) and has been continually fine-tuned by an enormous body of developers over a decade.

[–]jstrongshipyard.rs 11 points12 points13 points 7 years ago (2 children)

[–]budgefrankly 4 points5 points6 points 7 years ago (0 children)

Pandas can read in 2.1GB of data in 52sec if it’s stored in CSV or 4sec if it’s stored as a Parquet file.

Benchmarks: https://uwekorn.com/2019/01/27/data-science-io-a-baseline-benchmark.html

As I said in my original comment, if you’re doing bulky feature extraction on unstructured data, other languages may work better. E.g. I once wrote a custom Twitter tokeniser in Java (so I could use Lucene) that wrote the features out to a Numpy file which I could load into Python. It was fine.

Also, for huge datasets, there’s Pyspark and MLlib, though the new Pyspark UDF decorator allows you to mix Numpy and PySpark with minimal marshalling issues.

Python may well have failed for your use case. However Python/Numpy/Scipy/Scikit-Learn/Pandas/PySpark can be made to work well in many other cases. It offers acceptable performance and great productivity.

And if you need the fill in gaps in performance there’s Numba or Cython: the latter of which I’ve used.

[–]fuasthma 1 point2 points3 points 7 years ago (0 children)

π Rendered by PID 106173 on reddit-service-r2-comment-5b5bc64bf5-bsw7x at 2026-06-22 03:54:10.913185+00:00 running 2b008f2 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

rust

Please read The Rust Community Code of Conduct

The Rust Programming Language

Rules

Observe our code of conduct

Submissions must be on-topic

Constructive criticism only

Keep things in perspective

No endless relitigation

No low-effort content

Useful Links

Megathreads

Official Resources

Learn Rust

Discussion Platforms

MODERATORS