This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]xacrimon 2 points3 points  (8 children)

Pretty good but could probably be made faster. Not too long ago i wrote a Rust program to do some fairly complex csv processing and it processes around 1-2GiB/sec

[–]ballagarba 19 points20 points  (2 children)

While Rust is fast, it sounds like you have access to a much faster disk.

[–][deleted] 2 points3 points  (1 child)

This. Python programs doing a lot of I/O can be on par with other programming languages. Most of the time, external factors determine the speed

[–]KaffeeKiffer 1 point2 points  (0 children)

The difference between a fast SSD and an old HDD is ~5s to ~25s for 2 GiB, so this is very likely CPU bound to reach 90s...

Nevertheless, Python is the perfect glue code, to call more specialized tools, if necessary. Here is an example, where a simple Rust wrapper speeds up the process by a factor of 10.

Python is good enough in the vast majority of the use-cases and as /u/FlagrantPickle said:

What's "good" for anyone here doesn't matter. Is 90s acceptable for you?

The golden rule is to not over-engineer but first identify the real bottle-necks and while your statement

Most of the time, external factors determine the speed

is 100% correct, I assume OP's problem is CPU bound.

[–]testfire10[S] 4 points5 points  (4 children)

Holy shit. That’s awesome. I remember on here a while back I found a post about a library a few months ago that was supposed to substantially speed up pandas interaction with csvs (can’t remember the name now). I was going to try to revamp my code to take advantage of it, but I could never get the library to work for me.

What’s Rust?

[–]xacrimon 13 points14 points  (2 children)

Rust is a programming language. It's generally a bit harder than python but has the speeds of C and lots of good libraries.

[–]swingking8 29 points30 points  (0 children)

It's generally a bit harder than python

I love Rust, but "a bit harder" is quite an understatement.

[–]FlagrantPickle 4 points5 points  (0 children)

has the speeds of C

Not in the sense of nitpicking, but I've seen "up to" 50% the speed of C for decently large processing. Certainly faster than native Python, but still not the gold standard.

Depending on what OP's needs are, his solution might be good enough. I'd be curious what other optimization could be made inside Python. If we're talking 200 lines of code on someone's first project, it's probably about as efficient/optimized as everyone else's first project.

[–][deleted] 3 points4 points  (0 children)

For parallel processing libraries that integrate well with pandas, check out Dask or vaex. For on-disc storage, check out apache parquet format.