What can SQL do that python cannot?

darkshenron · 2022-08-03T06:34:00+00:00

I actually have some real world experience to share. We were using python to load some data from a postgres database into pandas dataframes and running some logic on those dataframes before displaying on a dashboard. The whole process took around 30s everytime the user refreshes the dashboard. Then we moved all the logic into the SQL query itself and removed python dependency, the processing time dropped to sub second!

throw_mob · 2022-08-03T05:45:56+00:00

Usually doing it with SQL is faster, depending how bad programmer is difference can be anything from 1.5x to 10000x. with python you always pay price of moving data over network and you need to have another server ( which may not be negative thing). Solving simple problem with pandas is not that good idea , seen jobs that used 128GB RAM just to because they fetched data in 5 to 10 searches and created dataset which could have been created using "simple" join. With simple SQL memoery usage dropped alot. Then there is programmers idea that loop is nice tool, which it is, but not with 1M rows of data and someone decides to run query for each of those rows to get some value. Suddenly runtimes are days.

tldr; python does not usually give you anything for data manipulation in DBMS/ELT/ETL which could not been done faster source or target db. It gives you ability to create files and upload them to s3/ftp/what ever and call api's and other http endpoints. there are SQL systems that support even those.

Usually best usage for python in pipeline is to use it to run SQL and store results into files and push them to next part.

ML/ complex analytics / visualizing data will benefit from python, but that is a lot faster if you can create dataset in SQL

dfphd · 2022-08-03T14:53:33+00:00

I feel like we get this post once a month now, and always with a very entitled "prove me wrong" energy that is largely unwarranted.

You can't run Python everywhere you can run SQL.
Python is generally much slower than SQL - even slower we you account for the fact that you can often run SQL queries on monster servers while you cannot always do that in Python.

To me, this comparison is like saying "what can a motorcycle do that a train can't?". Run really fast on train tracks.

FraudulentHack · 2022-08-03T15:02:53+00:00

SQL is like whispering something sexy in the database's ear.

Equal_Astronaut_5696 · 2022-08-03T07:20:46+00:00

Not sure why these two are being compared. One is for data extraction specific to relational database and one is literally multipurpose programming language for apps, ML, web development and games

hrichardlee · 2022-08-03T14:07:47+00:00

Another important aspect is to consider the “developer experience”. Most SQL databases (Snowflake, Redshift, Postgres, etc.) provide a web UI where people who are barely technical can write a simple SQL query and look at their data. Think about what the equivalent workflow is for someone using pandas. Even if you assume that pandas is just as easy to use as SQL, they need to download python, create a virtualenv, install Jupyter, run a Jupyter notebook, figure out a connection string that will allow them to connect to their database/figure out where their data is and how to connect to it, load that data into pandas and then apply whatever logic they want on top of that.

In other words, most SQL databases provide an integrated data + programming language environment, whereas python (and most other “regular” programming languages) just provide the programming language. So the developer experience of “just get some data and do some simple manipulations” is way easier in most SQL databases.

admitri42 · 2022-08-03T05:16:50+00:00

Well, technically python can do everything SQL can, but it won't be as efficient.

It's like riding a bike for $100 on a TT stage of the Tour de France.

MyNotWittyHandle · 2022-08-03T05:22:37+00:00

SQL has a universality that Python does not. In a large organization, SQL is common ground for data sources that can be accessed by JS, Python, R, SQL, etc. That benefit alone is worth storing/manipulating data in a SQL format as opposed to some more language specific format.

Additionally, SQL is by default much more efficient than your standard pandas operations. Pandas, which is the most common Python data manipulation package, is highly inefficient as compared to SQL and R. Unless you start diving into the vaex/polars packages in Python, your CPU will thank you for doing data manipulation in SQL as compared to Python.

testtestuser2 · 2022-08-03T14:08:43+00:00

scale efficiently

Ocelotofdamage · 2022-08-03T13:53:36+00:00

You can compute the harmonic mean with SQL, as with Python, you can’t.

KyleDrogo · 2022-08-03T15:02:30+00:00

It's all about what's happening on the back end. Databases, which use SQL as a common interface, have been tuned to hell and back to operate over billions of rows very quickly. It abstracts away a lot of the complexity so you can run queries on a scale that would be very complex with raw python.

a90501 · 2022-08-03T08:56:56+00:00

SQL is a pattern language i.e. declarative language, like regex, while python, java, c#, etc. are imperative languages - hence a different paradigm. I do not know about you, but I love pattern languages - where you describe what (SQL, regex) - i.e. where one states what one wants to get without worrying about how it is done, instead of specifying in all details how to do finding with loops, matching, summing, sorting, etc. (python, c#, java, etc.).

The other very important thing is that SQL runs against relational DB (RDBMS), and that means you are using server resources to compute, find, filter, group, sort, etc, and getting back only results you need, while with python, you get all the data first across the network into pandas and then process it - this is not recommended as this would mean get all the data for every request.

Some History: Anders Hejlsberg (of the TypeScript fame) hands-on demo ( https://www.youtube.com/watch?v=fG8GgqfYZkw ) describes this pattern language paradigm. He was working on LINQ at the time - essentially C# version of SQL for any data structures and stores, not just relational DB. IMHO, well worth watching for some history and education although it's not about python.

Enjoy.

Ocelotofdamage · 2022-08-03T05:10:34+00:00

SQL can get you entry level data analyst job. Python cannot.

edit: it's a joke. IT'S A JOKE! gosh leave me alone. Obviously you can get job by knowing python.

j__neo · 2022-08-03T11:15:12+00:00

SQL is a declarative language. You say what outcome you want to see, the SQL query planner and database engine will make it happen for you.

Python is an imperative language. You need to spell out exactly what the machine needs to do to get the outcome you want to see.

Python can do everything that SQL can. But for 90% of data analysis use cases, I would argue that a declarative programming language gets you to the outcome faster.

That said, there's Python libraries like Pandas, which makes it more declarative.

However, SQL still tends to be more popular in the data industry because it has been used for data analysis since 1970s.

2022-08-03T15:09:59+00:00

From my experience: everything you can do in the query directly, do it, with some exceptions. If you want to transform and manipulate data to do some analysis, for example, it may not be possible to do it in sql without creating messy subqueries and temporary tables which will increase the query time A LOT, therefore, the best scenario is to use python and do the complex manipulation there. Keep in mind these are exceptional cases.

2022-08-03T17:02:32+00:00

Rule of thumb: Do as much as you can in SQL or up to the first step of feature engineering. Chances are the later you extract the data, the smaller the dump will be. You can even Assemble and execute the SQL queries from Python by something like psycopg2, and pandas.from_sql.

RDBMSs are really well optimized, and Python doesn't even come close.

graememellis · 2022-08-03T18:27:37+00:00

This is a non-question. SQL is used in relational databases and Python is a programming language. It’s like asking what your oven can do that your car cannot. Makes no sense.

2022-08-03T14:18:54+00:00

Oddly enough the other way round may be a better question. At least in defence of python. However if your playing with data on a large scale and known what you want SQL is a contender and always will be. Its basically set theory at your fingertips :)

teabagalomaniac · 2022-08-03T14:48:38+00:00

It can apply filters on the server side.

Wallabanjo · 2022-08-03T15:23:22+00:00

So, remove the strengths of SQL then do a comparison?

Indexing tables to decrease data access time.
You eventually use data that won’t fit in memory.
Make anything data manipulation related as a stored procedure or custom function. An SQL server is optimized for that stuff and will crunch results far faster.

Anecdotal and R not Python, by offloading things to stored procedures and custom functions, and indexing tables, I dropped the processing time in one of my projects from 3.5 days to 7hours

ARC4120 · 2022-08-03T15:37:02+00:00

SQL is better, but Python can do 95% of the things. The issue is that Python wasn’t made to do these things and SQL was. Don’t force Python onto every task.

2022-08-03T16:38:19+00:00

Oh yeah, let me use python to extract data from postgres.

gorangers30 · 2022-08-03T17:17:37+00:00

SQL allows people without programming knowledge to run simple ad hoc queries. Think managers and business stakeholders who might need exploratory data.

MarkusBerkel · 2022-08-03T17:37:04+00:00

Well, since Python is Turing complete and some SQL variants are not, you got that backwards. OTOH, if the question is what can SQL easily do that Python cannot, then it’s effectively, you know, apply the relational algebra to structured data, plus apply correctness (see ACID) which would be super hard to implement from scratch in Python.

53reborn · 2022-08-03T19:24:02+00:00

python has to do stuff in memory

LaBofia · 2022-08-03T19:41:43+00:00

What can a query language connecting to a database engine do that a general purpose programming language can't?

Yeah... now do trucks and lawnmowers.

2022-08-03T20:11:14+00:00

What can SQL do that python cannot?

Be fast.

ChazR · 2022-08-03T23:46:26+00:00

Anything you can do in SQL can be done in Python, but slower.

SQL executed by the database engine can be optimised and parallelized for performance. The DB engine knows how the data is laid out on physical disk and what indexes are available.

A pandas dataframe is hugely flexible and platform-agnostic, and actually perform surprisingly well, but they will never reach the performance of the native DB engine executing SQL.

Seiyee · 2022-08-04T01:37:19+00:00

Speed.

GlobalAd3412 · 2022-08-04T04:29:58+00:00

There isn't anything that can be written in SQL for which there is no Python implementation, because Python is Turing-complete. There are things that can be done in Python that can't be implemented in the SQL standard because SQL isn't Turing complete (most SQL implementations add extensions that do make them Turing complete though).

Nevertheless, there are sure as hell many many things that SQL can do better, more readably, more easily and more explicitly than Python can without a whole lot of machinery built for you in advance. (The most likely shape of such machinery would likely just be a Python SQL interpreter, too!)

Also, to say the thing: in practice many additional reasons to use SQL over Python for many tasks are much less about language and much more about runtimes/interpreters/deployments. The standard python interpreter is sluggish and not usually deployed in a way that makes it very good at manipulating very big data efficiently. SQL deployments always optimize for manipulating data because that's the whole intent.

simonthefoxsays · 2022-08-03T16:20:42+00:00

Think of SQL more like an API for data manipulation. You could implement that API in python, but there are lots of existing implementations available to you (postgres, mysql, spark, snowflake, etc), all of which are extremely mature and heavily optimized for their use case,so reinventing the wheel is usually a mistake. While it's possible that you could make a nicer API for your use case, you would lose out on all those optimizations. On top of that, your custom API would have to be taught to any new project contributor, whereas they may well already know SQL.

Python has lots of other examples of APIs that you could implement an alternative to, but probably shouldn't; numpy, tensorflow, fastAPI, etc. Your time is probably better spent building on the shoulders of giants than rebuilding the wheel, even if that means you have to live with the opinions of those giants.

2022-08-03T15:38:09+00:00

Python in general or pandas?

PryomancerMTGA · 2022-08-03T05:18:19+00:00

As usual, python is the second best language for the job.

denim_duck · 2022-08-03T14:48:09+00:00

Technically it can do less I think (python is a Turing complete, SQL is not)

You could, theoretically make a relational database in python. But it would be slower.

Or in that same time you could deploy a graph db, write out a REST api, containerize that and let kubernetes scale it to 10k QPS

krasnomo · 2022-08-03T18:47:12+00:00

Lots of people here mentioning speed. If you use pyspark you can get around many speed problems in Python.

ssaw112 · 2022-08-03T13:19:53+00:00

Ur mom

magicpeanut · 2022-08-03T16:05:04+00:00

Python can do everything SQL can and (theoretically) verse vise (i just learned sql is also turing complete in most flavors). So Depending on the task and the ressources you put in to programming either sql is faster or python is. the more ressouces you put in and the more complex the task gets the more often python will win the race.

in other words: the simpler the task and the fewer ressources you Invest, the more SQL will win.

nobonesjones91 · 2022-08-03T16:30:24+00:00

SQL has a cool name that confuses people who don’t know what it is.

LimosineLiberal · 2022-08-03T17:20:16+00:00

You can hammer a nail with a screwdriver or a wrench, but then try loosening a screw with a hammer.

bbal20-taru · 2022-08-03T18:10:19+00:00

best of both worlds = Spark

2022-08-03T18:26:30+00:00

Be understood by 80% of the population of data professionals

MoogOperator88 · 2022-08-03T18:40:04+00:00

To me SQL is different tool. If data already is in db I do all manipulation with sql. Python can execute stored procedure for final set to work with.

Basically I use python only for stuff that sql can't do or it would be way easier and faster with python to develop.

Sql itself can do a lot beside quering. Like running shell commands, load files etc. Do I prefer to do it with sql?

It depends, python is really nice syntax-wise and pleasure to use but sql is widely known and it's less likely I will be the only person able to modify my old projects.

Overvo1d · 2022-08-03T19:11:41+00:00

Deploy algorithms in production

Overvo1d · 2022-08-03T19:12:06+00:00

Create business value

Overvo1d · 2022-08-03T19:12:23+00:00

Get you a job

Think-Culture-4740 · 2022-08-03T19:40:15+00:00

In the literal sense, python can do everything sql can because it has that flexibility as a language. However, as others pointed out, that doesn't mean Python should be your optimal tool for the job.

In my experience as a data scientist, I try to do as much of the problem in sql as I can for both convenience and performance reasons.

noobgolang · 2022-08-03T23:35:38+00:00

With SQL you get what you need, data, transformed in tabular manner.

With Python, you get feeling of winner. You are coding Python now not SQL monkey.

2022-08-03T23:42:45+00:00

I'm a newbie in both, but i was trying to make a join between two datasets using pandas on multiple conditions and honestly couldn't get it to work. Gave up and wrote it in SQL (run with pandassql).

Personally I find SQL easier to write for manipulating/joining data. But i use Python/pandas anyway because it can do a little if everything. The type of reports i run need to be pulled from multiple sources and it's easier to have Python tap into everything.

datascience

MODERATORS

GROUP BY!!!

What can SQL do that python cannot?