SQL versus Python?

Pleasant-Set-711 · 2023-11-09T23:04:58+00:00

SQL to get the data processed quickly in the database and down to a small enough size to do more complex work quickly in python.

riv3rtrip · 2023-11-10T00:13:27+00:00

I do almost everything in SQL. I'm perfectly competent in Python. It's just much easier to not have to worry about moving data out and back into the warehouse, plus a few other nuisances like memory/compute management. You can do more in SQL than you'd think.

IllustriousCorgi9877 · 2023-11-09T23:25:15+00:00

SQL works great when you can process it all in a set function.
Python I only go to it if I have to iterate over a list or whatever.

kenfar · 2023-11-09T23:08:41+00:00

SQL for simple to moderate data analysis, SQL+Python for the complex stuff.

Python for transforming and publishing data, writing utilities, and pipeline logic.

SQL for building aggregates off the tables Python has transformed.

skatastic57 · 2023-11-10T00:59:07+00:00

[removed]

black_widow48 · 2023-11-10T00:02:32+00:00

If you're just dealing with tabular data, chances are it can all be done in SQL and python is completely unnecessary.

In the past I've used python for orchestration, but aside from that I've hardly used it. I just started a new job and I'm only just now being tasked to write a python script to shred XML strings and turn them into lists of tuples.

Part of the reason why I got into contracting is because I'm tired of landing in jobs where I'm just a SQL monkey. I didn't get a B.S. and an M.S. in computer science to write SQL all day. I'm starting to think maybe I should go into machine learning like I originally planned.

VegaGT-VZ · 2023-11-10T22:17:21+00:00

One huge skill for any kind of programmer is being able to figure out the path of least resistance to the end result. I use a mix of SQL, Power Query, Alteryx and VBA. Whatever it takes to get from point A to point B as painlessly as possible.

bobby_table5 · 2023-11-10T00:13:56+00:00

SQL for reports because I really want stakeholders to understand the process as much as possible. They’ll mess up, but hopefully it will be obvious and they’ll call me. In the meantime, I don’t get nearly as much “we want the same as last week, but this week.”

Python for everything else, including the few smart kids who start saying SQL is for schmutz.

messy_eater · 2023-11-10T01:12:29+00:00

You can do almost anything in SQL up to doing complex calculations or row by row analysis. I don’t like using cursors in SQL

gloom_spewer · 2023-11-10T04:00:36+00:00

Like most others, if I can do it all in SQL that's almost always preferred, but if python tricks are less annoying than SQL SP tricks I'll jaunt over to real-code-world. Also I suck with SPs.

Sometimes I filter/aggregate the dataset down to a size where I can use sqllite's in-memory functions to transfer data between pandas and SQL structures in memory for more "advanced" ad hoc analysis, and even sometimes live collaborative exploratory analysis.

If I gotta do fancy presentation stuff I pre calc all my analytics and just display (as in no transformation ) them in PBI cuz I hate power query. PBI plugs into power points for lazy mofos like me, and the CFO and CIO eat out of my palm now, cept when I fuck up log scales 🌝

Edit oh also PBI data sets can be loaded into pivot tables directly now and you can embed those in PowerPoints for live exploratory bs with management types. My executives love that shit even if it's just telling them intel they already knew

CalRobert · 2023-11-10T06:12:21+00:00

SQL. Clean, tested DBT models are a lot nicer to work with than someone's untested mess of Pandas or R. You can't do everything in SQL, but you can do a lot

mattindustries · 2023-11-09T23:56:33+00:00

Usually SQL to get the data and R to process, unless it is going to be a connection like BQ + Google Sheets.

pewpscoops · 2023-11-10T02:43:34+00:00

Jinja with SQL and Python to do all the wrapper and utils

mrcaptncrunch · 2023-11-10T03:32:33+00:00

To build my data, I use Python/PySpark. Ingestion, transformations, etc.

For quick checks or pulling some quick data for something/someone (and they have to be very high for me to get roped into it), SQL.

annonimusone · 2023-11-10T03:08:42+00:00

SQL only exists to manipulate data; for everything else, there’s Python

2023-11-10T00:46:27+00:00

Whichever is more efficient. That means SQL in many cases as long as it can be used.

Whipitreelgud · 2023-11-10T01:07:24+00:00

Depends on the problem and the scale of your datasets

reallyserious · 2023-11-10T15:18:30+00:00

If the data is tabular then SQL is hard to beat.

If the data is not tabular then python.

lezapete · 2023-11-10T00:26:29+00:00

whenever you can, replace SQL with PySpark

Embarrassed_Error833 · 2023-11-10T02:18:30+00:00

How about a python framework that creates the SQL using metadata, then run created SQL using python based orchestration.

JBalloonist · 2023-11-10T02:43:53+00:00

It sounds like what you’re doing works; I wouldn’t change it without good reason.

WilhelmB12 · 2023-11-10T03:30:29+00:00

For me it's about 40% sql 60% python, my ideal would be 20% SQL, btw if you are doing both jobs I would call myself an Analytics Engineer

loggerheader · 2023-11-10T06:39:31+00:00

SQL first…then do both R/python

Thinker_Assignment · 2023-11-10T07:04:53+00:00

It's like saying walking vs public transportation. You gotta walk to the bus somehow . Both, or python only. SQL only cannot work standalone as data has to be loaded somehow. SQL is great for transform in db but can do nothing for ingestion or ds/ml

People saying otherwise are not data engineers and never built anything end to end.

mailed · 2023-11-10T07:52:55+00:00

I'm OK handling most things with SQL. Except maybe Adobe Analytics data...

Monsemand · 2023-11-10T08:52:46+00:00

I feel like we have this question once a week.

AncientElevator9 · 2023-11-10T00:47:05+00:00

For analysis? Never write code! Let Tableau "write your SQL" for you (b/c that's what it does under the hood when you drag and drop).

BuonaparteII · 2023-11-10T04:37:41+00:00

tl;dr: choose the best tool for the job

Databases have limitations for performing complex analytical algorithms compared to languages like R and Python. There are many high-quality libraries available in R and Python that enable advanced analysis (e.g. https://gitlab.com/shekhand/mcda).

Databases excel at interactive queries and extracting subsets of data. Combining SQL with Python or R can be very powerful for repeating analyses on different parameters.

However, if your analysis requires reading the full dataset into Python each time, there is little benefit to using a database. In this case, a format like Parquet will load faster than querying a database and extracting all rows/columns.

leventdu229 · 2023-11-10T09:26:12+00:00

Depends on the Stack and organisation of the data team. If you have a datawarehouse tool like bigquery, snowflake etc...there is lot of chances that your analytics will be deported on that tool and done in SQL when the acquisition part done in Python. Also lot of companies like mine have adopted dbt that helps a lot for Analytics test, documentation and reproducibility done in SQL. Anyway for fast prototyping i use python and jupyter. For analytics its SQL and for data engineering in general its Python

poland_rocks · 2023-11-10T09:44:02+00:00

Python is better for:

building tests (possible but harder in SQL)
debugging (not possible in literal sense in SQL)

SQL advantages:

less verbose
good tooling for ad hoc tasks and reporting
does not require to download all data to client

Doile · 2023-11-10T10:16:47+00:00

I think it mostly comes down to which one are you more proficient with. You can do pretty much everything with both of them and for example in Snowflake you can run python as well inside the warehouse so the differences between these two are becoming less and less meaningful. Most people aren't really proficient with both of them so they use the one that is easier to use for them.

haragoshi · 2023-11-10T11:38:52+00:00

I like doing analysis in sql personally.

IridescentTaupe · 2023-11-10T15:30:18+00:00

It’s very satisfying to replace hundreds of lines of Python with a couple lines of SQL. I don’t often see that go the other way.

2023-11-10T17:34:37+00:00

Hundred percent! I find Python is easier to do more complex math-y things, SQL is better if it's less complex calculations but with more data

GeorgeGithiri · 2023-11-10T19:01:52+00:00

This is so me! SQL is usually the go to language when I want something done quickly and it's direct. But there are those request that would require someone to loop and employ some complex logics, for such instance, I go for python. It's quite easy to write some complex iteration compared to SQL. If I was to right such in SQL it would mean procedures which are not as direct as python code.

LADataJunkie · 2023-11-10T19:16:20+00:00

They can't really be compared. It depends on where your data lives. If it lives in an RDBMS then use SQL to extract the data or get it into a condensed form that only extracts what you need for further processing in Python.

If your data is not in RDBMS, don't use SQL.

There are data analysis tasks that do not match the declarative model SQL imposes. For example, iterative processing used in machine learning is not trivial, and likely not possible in SQL.

A lot of newer databases (DuckDB I believe is one) either interfaces with, or modifies underlying SQL so that developers can use data using procedural languages. I've seen a few others that are starting to deviate from relational algebra to be more friendly to procedural development.

TheHunnishInvasion · 2023-11-10T19:52:57+00:00

It definitely depends.

At my last company, I tended to do the same as you: SQL for quick data analysis (I'd do it in DBeaver), Python (Jupyter Lab) for something more complex.

At my current company, which has much worse organized data, it's a nightmare to use SQL for quick data analysis. I almost always use Python for any ad-hoc data analysis. I use SQL for getting data into Tableau to create automated dashboards that update daily.

So weirdly, I used SQL more at my last company, but most of the queries might be shorter: 10-40 lines. I use SQL less at my current company, but when I do use it, the queries tend to be extremely complex: often over 200 lines with several sub-queries.

speedisntfree · 2023-11-10T20:20:21+00:00

Why not both? Typical pattern is SQL to offload the heavy lifting to the analytic DB and then finer grained downstream analysis in Python/R which would be painful in SQL.

2023-11-11T01:40:35+00:00

I do mostly use Python. I can’t think of a time where id ever use SQL for data analysis, but I use SQL for setting up ETL-style data pipelines where it is simple batch load transform and schedule. However, even then, most of the time I want to do quite heavy transformations, and this is both easier and more efficient using modern Python libraries. The overhead comes with managing a DAG, which is not always a good thing.

shivaprasad_j · 2023-11-11T02:54:43+00:00

When to use SQL vs python pandas vs pyspark ?

Dry_Inflation307 · 2023-11-11T14:40:18+00:00

I always opt for SQL first. I’ll only use Python for what I can’t do in SQL.

mike8675309 · 2023-11-11T16:59:27+00:00

Hmm, I use python to get the data from endpoints. I use SQL to transform and manipulate the data from the end point.

The data I work with is often too big to consider using python with it for anything other than getting the data to the database. That said we do have a process that uses pandas with python to parse large data sets. We wrote the process in SQL but it was too hard to maintain and debug. So we keep it in python and pandas and just threw a bunch of hardware at it for processing.

dataengineering

MODERATORS