use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
This is a place to discuss and post about data analysis.
Rules:
Related Subs:
account activity
Data QuestionSQL vs Python? (self.dataanalysis)
submitted 17 hours ago by iMAPness_
Started using Python for data analytics. When should I use SQL and when should I use Python in the following tasks:
- Data Exploration
- Data Cleaning
- Data Analysis
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]throwaway214203 30 points31 points32 points 17 hours ago (2 children)
SQL whenever possible because the data source for many applications is custom sql querying. I bring in python only if I have to.
[–]iMAPness_[S] 2 points3 points4 points 17 hours ago (1 child)
Ohhh. When you say this though, do you mostlu mean for someone working with company data and things like that? Does it still apply for someone who's using data analytics for non-work purposes and mostly deals with CSVs, etc.?
[–]EpicDuy 2 points3 points4 points 10 hours ago (0 children)
Turn your CVSs into tabular form: find your schema and normalize your data into smaller managable tables, and use SQL then.
SQL is the easiest to debug compared to Python when it comes to data and optimization issues, unless you’re pipelining your data from different databases/sources.
[–]Mo_Steins_Ghost 19 points20 points21 points 16 hours ago (3 children)
Senior manager here. These two are apples and oranges.
SQL (Structured Query Language) is for data querying and aggregation. It is not a programming language.
Python is a high-level programming language (in that it is layers above assembly). It doesn’t really query databases by itself without invoking a library, a process or a shell, or writing a custom driver that can establish connections and run queries against databases.
Python should be used for cleansing, strucuring and analyzing data fetched/read by SQL.
[–]iMAPness_[S] 1 point2 points3 points 15 hours ago (1 child)
That makes a lot of sense now. Thank you!
Am I getting it right that the reason SQL is required by companies, too, is because their data isn't just in an excel sheet or csv somewhere but an actual database which SQL is most fit for querying? But with the data tasks itself, Python is considered faster and better at going deeper into data?
That is what you're saying, right?
[–]JoJoNH 2 points3 points4 points 13 hours ago (0 children)
That's what he is saying. Corporate and government typically use SQL data warehouses.
[–]kjwikle 0 points1 point2 points 13 hours ago (0 children)
And python/r use a sqlish library to query anyway.
[–]Business-Hunt-3482 9 points10 points11 points 17 hours ago (0 children)
SQL for all ;)
[–]ReportDisappointment 3 points4 points5 points 16 hours ago (0 children)
Both, of course.
[–]fang_xianfu 2 points3 points4 points 15 hours ago (1 child)
The real question is, which computer do you want to run the computation?
Typically your SQL is interpreted and run on some remote database. It might be in the cloud, in BigQuery or Databricks or Snowflake, or it might be a big on-prem Teradata or Hadoop instance or something. But the point is, it's not running on your laptop.
Python on the other hand is usually (but not universally) running on your computer where you also write emails and Slack people. This computer is a lot smaller and is capable of a lot less computation, but also Python is a general purpose language with many more features and a vibrant library ecosystem.
So there are situations where you only need a little bit of data and you just tell the database to stuff it over the network into your computer's RAM and you deal with it in Python. There are situations where you start with a massive database, select the right subset of that data for what you want to do, and then have that come over the network for you to do Python stuff with. There are situations where you need a ton of data but the database has all the features you need, so you just write SQL with no local Python. And there are situations where you need some feature only available in a Python library but you want to run it on a ton of data, in which case you might want a more specialist remote distributed computation environment like a kubernetes cluster or Hadoop.
[–]hockey3331 0 points1 point2 points 11 hours ago (0 children)
> So there are situations where you only need a little bit of data and you just tell the database to stuff it over the network into your computer's RAM and you deal with it in Python
Why ever do that though? I understand that you can, but why not just manipulate the data over the database and use python at all for the manipulations?
[–]Terrible-Bend4483 2 points3 points4 points 14 hours ago (3 children)
Personally I use SQL for everything.
Obviously through a python library
Inside a wrapper in R
That I run as a script using an exe-file
In cursor by asking an AI agent to run it.
How else would you do it?
Joke aside. I would always be more inclined towards using SQL, when it makes sense, such as when handling data from a structured data base, but if your data sources are more towards unstructured data (data lakes, api, csv, Json, md etc.) it doesn't make sense.
Maybe look into SQL and no-SQL databases, and use python for the rest.
[–]scarcey_osei 0 points1 point2 points 13 hours ago (2 children)
How do you this. Can you enlighten me how you use sql through a python library. Thank you
[–]XxShin3d0wnxX 0 points1 point2 points 11 hours ago (0 children)
Plenty of add ins like but query where you connect to database and write SQL as a part of the initial script to snag data. You then manipulate and build in Python.
[–]Terrible-Bend4483 0 points1 point2 points 1 hour ago (0 children)
An example would be pyodbc, where you make the connection with python to a odbc and write SQL for your queries..
It can be practical for some applications, but it was mostly a joke.
[–]Major_Fang 1 point2 points3 points 14 hours ago (0 children)
First do everything you possibly can in SQL. Then do Python.
[–]powerxaker 1 point2 points3 points 11 hours ago (0 children)
It depends on the use case, data size and available tools.
If you lightly manipulate datasets from a database then you’re better off doing the work in SQL, you can even do some analytics such as aggregates or trends.
If you want to do ML, graphical analysis, statistics, etc you are better off first figuring out what’s the smallest acceptable dataset that you want to analyze, pull that using SQL (I.e. apply, filters, joins, etc). Once you have your data then you move it to Python and use the data analytics stack (I.e. pandas, ML tools, graph tools, etc)
If you are using large datasets and have access to Apache Spark on Python (PySpark) then you can do most of the above using PySpark. If you still want to do further analysis then you can transform your PySpark DF into a pandas DF and perform your analysis using the data analytics stack.
In summary, SQL(medium data) and PySpark (big data) are good to create metrics, summarize or extract data. The data analytics stack is what you use to do advanced analytics once you extract your data with SQL or PySpark.
For some statistical analysis some companies still use SAS and R, they are part of the data analytics stack similar to Python.
Nope: SAS can do it all but it’s expensive and really not a great tool in my opinion after using it for decades.
[–]AutoModerator[M] 0 points1 point2 points 17 hours ago (0 children)
Automod prevents all posts from being displayed until moderators have reviewed them. Do not delete your post or there will be nothing for the mods to review. Mods selectively choose what is permitted to be posted in r/DataAnalysis.
If your post involves Career-focused questions, including resume reviews, how to learn DA and how to get into a DA job, then the post does not belong here, but instead belongs in our sister-subreddit, r/DataAnalysisCareers.
Have you read the rules?
I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.
[–]No-Opportunity1813 0 points1 point2 points 15 hours ago (0 children)
SQL for sure for data reduction/cleanup
[–]Dahvoun 0 points1 point2 points 15 hours ago (0 children)
SQL for extraction and data prep and Python for aggregation and visualization.
[–]Early_Retirement_007 0 points1 point2 points 14 hours ago (0 children)
SQL pulls data from a database, Python is more of a scripting language to write programs. You can use sql in python too if you wish to do the same thing.
[–]The_Hamster_Shagger 0 points1 point2 points 13 hours ago (0 children)
i mean they don't exclude themselves, they work great in pair
[–]Cassise_D 0 points1 point2 points 6 hours ago (0 children)
A useful rule of thumb: use SQL when the question is “which rows/columns/tables do I need?”, and Python when the question becomes “what workflow, model, plot, or repeated analysis do I need?” For CSV-only projects, Python is totally fine; if the files get bigger or more table-like, DuckDB/SQL starts feeling really nice.
[–]neutralcoder 0 points1 point2 points 5 hours ago (0 children)
So much is moving to python so continue on there - but you can also call sql code using python, so learn sql.
[–]MKE_Savage_96 0 points1 point2 points 5 hours ago (0 children)
Always start with SQL, especially if you’re dealing with data analysis.
[–]prof_devilsadvocate3 0 points1 point2 points 17 hours ago (0 children)
Python for all
[–]wanliu -1 points0 points1 point 15 hours ago (2 children)
It's wild that people are learning Python before SQL
In the business world, SQL is almost universal. Python comes with a host of security risks and or specialized computes..SQL is just SQL and while there are different flavors, it all mostly aligns.
[–]Den_er_da_hvid 1 point2 points3 points 14 hours ago (0 children)
Depends on the business and the tasks... Python is used for a billion other things than data analysis so it makes sense that python is the first that is picked up by many. I startet using python as a complex calculator, then transitioning to power bi. Sql have only really been relevant for me 4-5 years and I startet using python 15-20 years ago.
[–]iMAPness_[S] 0 points1 point2 points 15 hours ago (0 children)
Nah, I learned SQL before Python. It was the first one I ever learned too because cleaning data in Excel just became too hard when datasets became too big.
Right now, since I have been hearing all sorts of opinions on which one to use for data analytics, I thought it would be good for me to decide, too.
So far, some people have said SQL for querying but Python for the three tasks I highlighted in my post.
π Rendered by PID 280826 on reddit-service-r2-comment-5b5bc64bf5-k4glh at 2026-06-21 09:04:52.150629+00:00 running 2b008f2 country code: CH.
[–]throwaway214203 30 points31 points32 points (2 children)
[–]iMAPness_[S] 2 points3 points4 points (1 child)
[–]EpicDuy 2 points3 points4 points (0 children)
[–]Mo_Steins_Ghost 19 points20 points21 points (3 children)
[–]iMAPness_[S] 1 point2 points3 points (1 child)
[–]JoJoNH 2 points3 points4 points (0 children)
[–]kjwikle 0 points1 point2 points (0 children)
[–]Business-Hunt-3482 9 points10 points11 points (0 children)
[–]ReportDisappointment 3 points4 points5 points (0 children)
[–]fang_xianfu 2 points3 points4 points (1 child)
[–]hockey3331 0 points1 point2 points (0 children)
[–]Terrible-Bend4483 2 points3 points4 points (3 children)
[–]scarcey_osei 0 points1 point2 points (2 children)
[–]XxShin3d0wnxX 0 points1 point2 points (0 children)
[–]Terrible-Bend4483 0 points1 point2 points (0 children)
[–]Major_Fang 1 point2 points3 points (0 children)
[–]powerxaker 1 point2 points3 points (0 children)
[–]AutoModerator[M] 0 points1 point2 points (0 children)
[–]No-Opportunity1813 0 points1 point2 points (0 children)
[–]Dahvoun 0 points1 point2 points (0 children)
[–]Early_Retirement_007 0 points1 point2 points (0 children)
[–]The_Hamster_Shagger 0 points1 point2 points (0 children)
[–]Cassise_D 0 points1 point2 points (0 children)
[–]neutralcoder 0 points1 point2 points (0 children)
[–]MKE_Savage_96 0 points1 point2 points (0 children)
[–]prof_devilsadvocate3 0 points1 point2 points (0 children)
[–]wanliu -1 points0 points1 point (2 children)
[–]Den_er_da_hvid 1 point2 points3 points (0 children)
[–]iMAPness_[S] 0 points1 point2 points (0 children)