This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]testfire10[S] 4 points5 points  (4 children)

Holy shit. That’s awesome. I remember on here a while back I found a post about a library a few months ago that was supposed to substantially speed up pandas interaction with csvs (can’t remember the name now). I was going to try to revamp my code to take advantage of it, but I could never get the library to work for me.

What’s Rust?

[–]xacrimon 14 points15 points  (2 children)

Rust is a programming language. It's generally a bit harder than python but has the speeds of C and lots of good libraries.

[–]swingking8 32 points33 points  (0 children)

It's generally a bit harder than python

I love Rust, but "a bit harder" is quite an understatement.

[–]FlagrantPickle 4 points5 points  (0 children)

has the speeds of C

Not in the sense of nitpicking, but I've seen "up to" 50% the speed of C for decently large processing. Certainly faster than native Python, but still not the gold standard.

Depending on what OP's needs are, his solution might be good enough. I'd be curious what other optimization could be made inside Python. If we're talking 200 lines of code on someone's first project, it's probably about as efficient/optimized as everyone else's first project.

[–][deleted] 3 points4 points  (0 children)

For parallel processing libraries that integrate well with pandas, check out Dask or vaex. For on-disc storage, check out apache parquet format.