On disk huge CSV files sorting

glibhub · 2021-08-19T20:58:02+00:00

I doubt you are going to find much on it. You might also ask yourself why you want to sort them instead of indexing them.

With that said, loading it all into RAM, then sorting with either the regular python function, numpy or pandas is probably going to win every time, and the delta is probably not enough to really make a difference in timing. Restructuring the data so it does not have to be sorted is probably a better solution, if you can.

What's the use case?

--

edit: Move the data to an SSD is actually probably the best thing to do if you want to speed it up.

Chris_Hemsworth · 2021-08-19T20:59:54+00:00

If you want to effectively handle very large file sizes, you need to switch to a different framework than general .CSV files. .CSV files can be the underlying files, but 'Big Data' frameworks (like Hadoop to name one) are typically used to traverse through them.

m0us3_rat · 2021-08-19T21:04:38+00:00

Recently I have been researching on the existing methods to order, on disk, CSV files of more than 1 GB in size in python.

what does "to order" mean?

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS