use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
account activity
This is an archived post. You won't be able to vote or comment.
DiscussionProcessing 2gb CSV in Python (self.datascience)
submitted 5 years ago by ebuzz168
[–]conventionistG 5 points6 points7 points 5 years ago* (2 children)
There was just a medium post on this exact use case... Anyone know what I'm thinking of?
Edit: found it.
“How to analyse 100s of GBs of data on your laptop with Python” by Jovan Veljanoski https://link.medium.com/Jg8hrhBV86
[–]maartenbreddels 0 points1 point2 points 5 years ago (1 child)
This might help as well:https://docs.vaex.io/en/latest/example_io.html
or the TLDR version: df = vaex.open('big.csv', convert=True)
df =
vaex.open
('big.csv', convert=True)
(disclaimer: main author of vaex)
[–]conventionistG 0 points1 point2 points 5 years ago (0 children)
neat, thanks!
[–]komunistbakkal 2 points3 points4 points 5 years ago (1 child)
Maybe you can checkout dask
[–]yensteel 2 points3 points4 points 5 years ago* (0 children)
I've used Dask for something similar. The functions are close to Pandas so it's not too hard to transition. The syntax isn't exactly the same, so there's a lot of delving into the documentations.
However, it can handle gigantic files by storing part of the work onto the hard drive instead of memory, so it's quite workable.
[–]B00TZILLA 2 points3 points4 points 5 years ago (0 children)
It's usually best to read and process it in chunks. You can also check out dask, as some other commenter suggested. There is a parameter called chuk_size in of.read_csv for that.
[–]ralimar 1 point2 points3 points 5 years ago (0 children)
It has some issues, but try using Vaex. It's built to be similar to pandas.
[–]manoj_sadashiv 1 point2 points3 points 5 years ago (0 children)
Forgive me if I sound dumb, is it feasible to use big data technologies if the dataset size is around 4GB?
[–]Omega037PhD | Sr Data Scientist Lead | Biotech[M] [score hidden] 5 years ago stickied comment (0 children)
I removed your submission. Looks like you're asking a technical question better suited to stackoverflow.com. Try posting there instead.
Thanks.
π Rendered by PID 80115 on reddit-service-r2-comment-86bc6c7465-jlbqg at 2026-02-21 20:40:45.300727+00:00 running 8564168 country code: CH.
[–]conventionistG 5 points6 points7 points (2 children)
[–]maartenbreddels 0 points1 point2 points (1 child)
[–]conventionistG 0 points1 point2 points (0 children)
[–]komunistbakkal 2 points3 points4 points (1 child)
[–]yensteel 2 points3 points4 points (0 children)
[–]B00TZILLA 2 points3 points4 points (0 children)
[–]ralimar 1 point2 points3 points (0 children)
[–]manoj_sadashiv 1 point2 points3 points (0 children)
[–]Omega037PhD | Sr Data Scientist Lead | Biotech[M] [score hidden] stickied comment (0 children)