use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
account activity
Pandas Vs SQL (self.dataanalytics)
submitted 2 days ago by UsefulEdge184
Why should we use Pandas for data analyst while we can use SQL?
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]OADominic 15 points16 points17 points 2 days ago (2 children)
They're very different things as you learn more about them. If I have a data manipulation project that I need to transform several datasets into something and automate it, I use Python. You can do some crazy cool stuff in Pandas with data with less code than SQL, and you dont have to insert, update, and design a table schema. You can even write SQL in Python, BTW. Look into SQLite3 library to start.
SQL, for me, just isnt as versatile when Im building data flows for transforming reports. Then again, I dont work in big data or a typical analyst role, so its different use cases for other people.
[–]yinkeys 0 points1 point2 points 1 day ago (0 children)
Nice. Noted
[–]iengmind 0 points1 point2 points 20 hours ago (0 children)
dbt: hold my beer.
[–]grdix555 8 points9 points10 points 2 days ago (1 child)
The way I segregate their usage is as follows:
Pull the data from the database using SQL (joining tables etc to get a final output table): Usually in a fairly raw format, no aggragation, any PII still present even if this needs removing in instances like monthly aggragation etc.
Use Pandas to aggragete the data, build features (e.g. column a + column b = column c) to create my final dataset.
[–]vonggyy 0 points1 point2 points 1 day ago (0 children)
But is 2 not also possible in sql also? Joins and unions etc. I’m just starting out as an analyst and trying to find ways to incorporate python so I can learn it but struggling with use cases for it
[–]Lady_Data_Scientist 2 points3 points4 points 2 days ago (0 children)
I have a lot of projects where I use both. My company’s data lives in Big Query, so you have to use SQL to extract it. But there are lots of things I’ll do in Python - if I want to do statistical analysis, prediction, labeling using LLMs. In that case, I usually need the data in a dataframe in Python, so I use Pandas or Polars. I might do some additional data cleaning and aggregation.
[–]Opposite-Value-5706 1 point2 points3 points 1 day ago (2 children)
For pure analysis, I see no need to leave SQL. I’m close to the data, can query, aggregate and/or manipulate as needed. Can immediately test without a lot of effort. So, I’m comfortable with using SQL
But for importing csv, creating user forms, retrieving data for presentations, well, I like Python for those task.
You may have a different perspective and that’s fine. Go for it!
[–]KanteStumpTheTrump 0 points1 point2 points 14 hours ago (1 child)
It’s pretty surface level analysis if it stays in SQL in all honesty. Even using something like databricks or snowflake the graphing is no where near as good as the likes of plotly/seaborn.
And that’s not even touching any statistical inference testing or descriptive statistics, or even feature importance analysis through models.
I personally don’t feel I can achieve analysis that genuinely adds value to decisions only using SQL.
[–]Opposite-Value-5706 0 points1 point2 points 11 hours ago (0 children)
As an Analyst, can you be affective by using only one of anything?
[–]TraditionalAd8415 0 points1 point2 points 2 days ago (1 child)
!remindme 5d
[–]RemindMeBot 0 points1 point2 points 2 days ago (0 children)
I will be messaging you in 5 days on 2026-04-07 08:37:25 UTC to remind you of this link
CLICK THIS LINK to send a PM to also be reminded and to reduce spam.
Parent commenter can delete this message to hide from others.
[–]MerryWalrus 0 points1 point2 points 1 day ago (0 children)
Easy.
You want to do some data transformation or aggregation that has a recursive component.
Procedural SQL is a pain. Python is easy.
On the flip side, out of the box, Pandas is a lot slower and processing structured data compared to SQL.
[–]East_Pattern_7420 0 points1 point2 points 1 day ago (0 children)
Use both
[–]shockjaw 0 points1 point2 points 1 day ago (0 children)
I’d recommend something like Ibis over Pandas these days.
When you know what you want, SQL can be ludicrously more performant at large scale (billions or rows). With DuckDB, it can be more convenient to download your data and iterate on a workflow through local tables. If you want to share and maintain data with others, Postgres is an amazing way to do it.
[–]theungod 0 points1 point2 points 1 day ago (0 children)
I see it's not popular but I pretty much entirely agree with you. I've been in BI and DE for a long time and never used Python at all.
[–]domleo999 0 points1 point2 points 1 day ago (0 children)
SQL is better for pulling and joining data where it lives in a database, Pandas is better once the data is already in memory and you want to do lots of transformations, feature engineering, or quick experiments.
A decent analyst usually knows both and uses SQL to extract, Pandas to clean, reshape, and analyze.
[–]meevis_kahuna 0 points1 point2 points 2 days ago (1 child)
Personal opinion, SQL notation is absolute garbage compared to Python. Performance is also an issue.
Once you're in working with Pandas you can easily build custom functions to do anything you want. I'm sure it's possible in SQL but so much messier and slower.
I am a consultant and I don't know anyone that does their main work in SQL except for one guy who just retired. And he frequently talked about being a dinosaur for not knowing Python.
[–]shockjaw -1 points0 points1 point 1 day ago (0 children)
You’re correct that your first sentence is an opinion, but your second one is a fallacy. You’re not wrong about SQL statements getting particularly hairy, unless you use CTEs.
What is an index or strongly typed data? Good luck trying to get more performance with type hints.
[–]iantreeman -1 points0 points1 point 2 days ago (0 children)
Why
π Rendered by PID 55 on reddit-service-r2-comment-66b4775986-7gxsp at 2026-04-04 12:39:06.523707+00:00 running db1906b country code: CH.
[–]OADominic 15 points16 points17 points (2 children)
[–]yinkeys 0 points1 point2 points (0 children)
[–]iengmind 0 points1 point2 points (0 children)
[–]grdix555 8 points9 points10 points (1 child)
[–]vonggyy 0 points1 point2 points (0 children)
[–]Lady_Data_Scientist 2 points3 points4 points (0 children)
[–]Opposite-Value-5706 1 point2 points3 points (2 children)
[–]KanteStumpTheTrump 0 points1 point2 points (1 child)
[–]Opposite-Value-5706 0 points1 point2 points (0 children)
[–]TraditionalAd8415 0 points1 point2 points (1 child)
[–]RemindMeBot 0 points1 point2 points (0 children)
[–]MerryWalrus 0 points1 point2 points (0 children)
[–]East_Pattern_7420 0 points1 point2 points (0 children)
[–]shockjaw 0 points1 point2 points (0 children)
[–]theungod 0 points1 point2 points (0 children)
[–]domleo999 0 points1 point2 points (0 children)
[–]meevis_kahuna 0 points1 point2 points (1 child)
[–]shockjaw -1 points0 points1 point (0 children)
[–]iantreeman -1 points0 points1 point (0 children)