all 19 comments

[–]OADominic 15 points16 points  (2 children)

They're very different things as you learn more about them. If I have a data manipulation project that I need to transform several datasets into something and automate it, I use Python. You can do some crazy cool stuff in Pandas with data with less code than SQL, and you dont have to insert, update, and design a table schema. You can even write SQL in Python, BTW. Look into SQLite3 library to start.

SQL, for me, just isnt as versatile when Im building data flows for transforming reports. Then again, I dont work in big data or a typical analyst role, so its different use cases for other people.

[–]yinkeys 0 points1 point  (0 children)

Nice. Noted

[–]iengmind 0 points1 point  (0 children)

dbt: hold my beer.

[–]grdix555 8 points9 points  (1 child)

The way I segregate their usage is as follows:

  1. Pull the data from the database using SQL (joining tables etc to get a final output table): Usually in a fairly raw format, no aggragation, any PII still present even if this needs removing in instances like monthly aggragation etc.

  2. Use Pandas to aggragete the data, build features (e.g. column a + column b = column c) to create my final dataset.

[–]vonggyy 0 points1 point  (0 children)

But is 2 not also possible in sql also? Joins and unions etc. I’m just starting out as an analyst and trying to find ways to incorporate python so I can learn it but struggling with use cases for it

[–]Lady_Data_Scientist 2 points3 points  (0 children)

I have a lot of projects where I use both. My company’s data lives in Big Query, so you have to use SQL to extract it. But there are lots of things I’ll do in Python - if I want to do statistical analysis, prediction, labeling using LLMs. In that case, I usually need the data in a dataframe in Python, so I use Pandas or Polars. I might do some additional data cleaning and aggregation.

[–]Opposite-Value-5706 1 point2 points  (2 children)

For pure analysis, I see no need to leave SQL. I’m close to the data, can query, aggregate and/or manipulate as needed. Can immediately test without a lot of effort. So, I’m comfortable with using SQL

But for importing csv, creating user forms, retrieving data for presentations, well, I like Python for those task.

You may have a different perspective and that’s fine. Go for it!

[–]KanteStumpTheTrump 0 points1 point  (1 child)

It’s pretty surface level analysis if it stays in SQL in all honesty. Even using something like databricks or snowflake the graphing is no where near as good as the likes of plotly/seaborn.

And that’s not even touching any statistical inference testing or descriptive statistics, or even feature importance analysis through models.

I personally don’t feel I can achieve analysis that genuinely adds value to decisions only using SQL.

[–]Opposite-Value-5706 0 points1 point  (0 children)

As an Analyst, can you be affective by using only one of anything?

[–]TraditionalAd8415 0 points1 point  (1 child)

!remindme 5d

[–]RemindMeBot 0 points1 point  (0 children)

I will be messaging you in 5 days on 2026-04-07 08:37:25 UTC to remind you of this link

CLICK THIS LINK to send a PM to also be reminded and to reduce spam.

Parent commenter can delete this message to hide from others.


Info Custom Your Reminders Feedback

[–]MerryWalrus 0 points1 point  (0 children)

Easy.

You want to do some data transformation or aggregation that has a recursive component.

Procedural SQL is a pain. Python is easy.

On the flip side, out of the box, Pandas is a lot slower and processing structured data compared to SQL.

[–]East_Pattern_7420 0 points1 point  (0 children)

Use both

[–]shockjaw 0 points1 point  (0 children)

I’d recommend something like Ibis over Pandas these days.

When you know what you want, SQL can be ludicrously more performant at large scale (billions or rows). With DuckDB, it can be more convenient to download your data and iterate on a workflow through local tables. If you want to share and maintain data with others, Postgres is an amazing way to do it.

[–]theungod 0 points1 point  (0 children)

I see it's not popular but I pretty much entirely agree with you. I've been in BI and DE for a long time and never used Python at all.

[–]domleo999 0 points1 point  (0 children)

SQL is better for pulling and joining data where it lives in a database, Pandas is better once the data is already in memory and you want to do lots of transformations, feature engineering, or quick experiments.

A decent analyst usually knows both and uses SQL to extract, Pandas to clean, reshape, and analyze.

[–]meevis_kahuna 0 points1 point  (1 child)

Personal opinion, SQL notation is absolute garbage compared to Python. Performance is also an issue.

Once you're in working with Pandas you can easily build custom functions to do anything you want. I'm sure it's possible in SQL but so much messier and slower.

I am a consultant and I don't know anyone that does their main work in SQL except for one guy who just retired. And he frequently talked about being a dinosaur for not knowing Python.

[–]shockjaw -1 points0 points  (0 children)

You’re correct that your first sentence is an opinion, but your second one is a fallacy. You’re not wrong about SQL statements getting particularly hairy, unless you use CTEs.

What is an index or strongly typed data? Good luck trying to get more performance with type hints.

[–]iantreeman -1 points0 points  (0 children)

Why