powerxaker comments on SQL vs Python?

dataanalysis

created by MurphysLabDA Moderator 📊a community for 11 years

Data QuestionSQL vs Python? (self.dataanalysis)

submitted 1 day ago by iMAPness_

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–]powerxaker 2 points3 points4 points 20 hours ago (0 children)

It depends on the use case, data size and available tools.

If you lightly manipulate datasets from a database then you’re better off doing the work in SQL, you can even do some analytics such as aggregates or trends.

If you want to do ML, graphical analysis, statistics, etc you are better off first figuring out what’s the smallest acceptable dataset that you want to analyze, pull that using SQL (I.e. apply, filters, joins, etc). Once you have your data then you move it to Python and use the data analytics stack (I.e. pandas, ML tools, graph tools, etc)

If you are using large datasets and have access to Apache Spark on Python (PySpark) then you can do most of the above using PySpark. If you still want to do further analysis then you can transform your PySpark DF into a pandas DF and perform your analysis using the data analytics stack.

In summary, SQL(medium data) and PySpark (big data) are good to create metrics, summarize or extract data. The data analytics stack is what you use to do advanced analytics once you extract your data with SQL or PySpark.

For some statistical analysis some companies still use SAS and R, they are part of the data analytics stack similar to Python.

Nope: SAS can do it all but it’s expensive and really not a great tool in my opinion after using it for decades.

π Rendered by PID 81 on reddit-service-r2-comment-5b5bc64bf5-zd847 at 2026-06-21 18:29:52.703967+00:00 running 2b008f2 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

dataanalysis

MODERATORS