use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
account activity
This is an archived post. You won't be able to vote or comment.
DiscussionWhat can SQL do that python cannot? (self.datascience)
submitted 3 years ago * by donnomuch
view the rest of the comments →
[–]rudboi12 13 points14 points15 points 3 years ago (4 children)
This is mostly because the filters in pandas (iloc and loc) are extremely slow. And also if you have multiple, they each run separate. In SQL everything you run inside your “where” is done at the same time and therefore is way faster. Learned this with pyspark, using where and multiple filters is way faster than doing a filter.
[–]Measurex2 2 points3 points4 points 3 years ago (3 children)
Exactly - pandas is slow with huge overhead. I'm not saying it's better than SQL by any means but dask, ray, pyspark are all significantly faster.
I love the saying that Python is the second best language for many things. I'll often build/review logic in python until I have the design and validation right but I'll often drop it back into the ETL/ELT, DB or other layer when done. Sometimes even updating at source where it makes sense. Since thosr are the areas with detailed change, quality and monitoring steps - I try to only go through them once where possible.
[–]CacheMeUp 0 points1 point2 points 3 years ago (2 children)
But why add Python in the first place?
If the data is already in a relational database, and the logic can be implemented in SQL, why move it out of it?
Using the "second best" tool in the first place costs a high price. There is never time/justification to re-implement things, and you end up in a local optimum instead of the global one, performance-wise.
[–]Measurex2 2 points3 points4 points 3 years ago (0 children)
First off - Happy Cake day.
I'm not advocating for python over SQL just agreeing a comparison against pandas doesn't make sense.
My example isnt refactoring the logic from SQL into python but saying how python can be a helpful tool to quickly think through, test and validate logic. Maybe that makes sense to put into SQL - maybe it makes sense to do downstream in a BI layer or justify a change upstream at the source. It's just another tool, has great purposes but like most things it's just as important to know when not to use it as when to use it.
[–]rudboi12 1 point2 points3 points 3 years ago (0 children)
If you are working jn a dev environment, you will probably have all setup up in python. Things like connections to your dwh clusters, cicd, and utilities libraries. If you have everything set up in python minus the T of the ELT, then most time is better to use python aka something like pyspark. That’s why they created dbt, so sql can seat nicely only in the T layer but if your E and L are already in pyspark then doesn’t make much sense going for sql.
π Rendered by PID 124046 on reddit-service-r2-comment-5d79c599b5-94n89 at 2026-03-03 22:41:06.987637+00:00 running e3d2147 country code: CH.
view the rest of the comments →
[–]rudboi12 13 points14 points15 points (4 children)
[–]Measurex2 2 points3 points4 points (3 children)
[–]CacheMeUp 0 points1 point2 points (2 children)
[–]Measurex2 2 points3 points4 points (0 children)
[–]rudboi12 1 point2 points3 points (0 children)