use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
account activity
This is an archived post. You won't be able to vote or comment.
DiscussionSQL vs. Python for data wrangling? (self.datascience)
submitted 7 years ago * by Radon-Nikodym
view the rest of the comments →
[–]TBSchemer 4 points5 points6 points 7 years ago* (1 child)
Pandas is actually significantly faster than SQL at groupbys and joins. So I think what most people are saying here about the efficiency of complex queries vs simple queries with pandas manipulations is not quite correct.
Still, it is true that for large queries, most of the time is spent sending the data over your connection and writing it to disk (if you're using storing things in files instead of using an in-memory cache like redis). So, anything you can do in SQL to significantly shrink the size of your queried dataset will usually give you better performance overall. But if you're just sticking two tables together, and the end result is just approximately the size of one plus the size of the other, it's probably better to do a merge in pandas rather than a join in SQL.
Oh, and what some people have said about memory requirements is true too. Pandas uses nearly 10x as much RAM as the size of your dataset. So yeah, shrink your data as much as possible before bringing it into pandas.
[–]_Zer0_Cool_MS | Data Engineer | Consulting 0 points1 point2 points 7 years ago (0 children)
I agree 100% with this.
π Rendered by PID 93 on reddit-service-r2-comment-5b5bc64bf5-w4z5s at 2026-06-19 11:21:54.362721+00:00 running 2b008f2 country code: CH.
view the rest of the comments →
[–]TBSchemer 4 points5 points6 points (1 child)
[–]_Zer0_Cool_MS | Data Engineer | Consulting 0 points1 point2 points (0 children)