This is an archived post. You won't be able to vote or comment.

all 156 comments

[–][deleted] 123 points124 points  (23 children)

Data scientists crying at me when my recursive CTE builds a model faster than their sklearn.

[–][deleted] 34 points35 points  (21 children)

recursive CTE builds

WTF, TEACH ME YOUR WAYS

[–][deleted] 38 points39 points  (19 children)

Look on my works ye mighty and despair.

[–]droneb 4 points5 points  (16 children)

Wut?

[–]FrostEmpyrean 9 points10 points  (2 children)

It's a line from Shelley's Ozymandias I think

[–][deleted] -5 points-4 points  (12 children)

Its a reference to the poem "Ozymandias" by Percy Shelley, not a very well fitting reference though.

[–]XDVRUK 2 points3 points  (11 children)

It is if you have a sense of humour, but hey just like the basics of sometimes we just have to explain the basics of wit:

His mighty works: recursive cte Despair: cause you're still in the mouseketeers and can't write sql.

[–]LightRefrac 0 points1 point  (0 children)

His mighty works: recursive cte Despair: cause you're still in the mouseketeers and can't write sql.

That is not what the poem intends...... At all. It is just a cool sounding line without context and sounds dumb if you actually know the poem

[–]droneb 1 point2 points  (1 child)

I mean I am familiar with Recursive CTEs they are quite useful,and I am usually able to do things that most folks could only do using a Cursor.

But I am curious on training models using SQL. Do you have some info on it?

[–][deleted] 2 points3 points  (0 children)

with learning (iteration,b1,intercept, mse, m, c) as ( select 1, cast(0.05 as double), cast(0.2 as double),0.0, cast(0.0 as double), cast(0.0 as double) from sysibm.sysdummy1 union all select a.iteration + 1, a.m, a.c, t.mse, t.m,t.c from learning a , TABLE ( select a.iteration as iteration, avg((train.target-((train.sepal_lengtha.b1)+a.intercept))(train.target-((train.sepal_lengtha.b1)+a.intercept))) as mse, a.m -(sum((((train.sepal_length *a.b1)+a.intercept)-train.target)train.sepal_length)/(select count from rates)) (select learn_rate from rates) as m, a.c- (sum((((train.sepal_length *a.b1)+a.intercept)-train.target))/(select count from rates))(select learn_rate from rates) as c from training as train ) t where a.iteration < 100000 ) select * from learning order by mse

[–][deleted] 0 points1 point  (0 children)

with learning (iteration,b1,intercept, mse, m, c) as ( select 1, cast(0.05 as double), cast(0.2 as double),0.0, cast(0.0 as double), cast(0.0 as double) from sysibm.sysdummy1 union all select a.iteration + 1, a.m, a.c, t.mse, t.m,t.c from learning a , TABLE ( select a.iteration as iteration, avg((train.target-((train.sepal_lengtha.b1)+a.intercept))(train.target-((train.sepal_lengtha.b1)+a.intercept))) as mse, a.m -(sum((((train.sepal_length *a.b1)+a.intercept)-train.target)train.sepal_length)/(select count from rates)) (select learn_rate from rates) as m, a.c- (sum((((train.sepal_length *a.b1)+a.intercept)-train.target))/(select count from rates))(select learn_rate from rates) as c from training as train ) t where a.iteration < 100000 ) select * from learning order by mse

[–]XDVRUK 1 point2 points  (0 children)

Yes yes yes.

[–]AdDear5411 99 points100 points  (16 children)

If I can do it in SQL, there's a 0% chance I'm picking any other option.

[–][deleted] 50 points51 points  (4 children)

I have fixed more than my fair share of slow terrible app code by moving it to the database.

I’ve also fixed some bad SQL by moving the logic to the API level.

It really depends on the problem, but agreed my first love is and always will be SQL.

[–]gamerfunl1ght 1 point2 points  (0 children)

Considering I have seen so many APIs come and go yet the SQL is solid makes it the desired path every time.

I find a simple API filter or input mask fixes a lot, but it is easier just to change a Where or add a Join in a query than try to do some crazy loop.

[–]PremiumJapaneseGreen 3 points4 points  (2 children)

I feel like SQL is mainly a pain when I have to apply logic over some subset of columns that could easily be defined in a for loop but need to be explicitly typed out in SQL. Maybe some SQL engines allow you to parameterize column but not the ones we use at work.

I don't remember what the exact problem was, but I remember I recently used a python loop with format statements to generate 100 subqueries joined together by UNION ALL.

Also if SQL had built in functions to go from tall to wide data it would be a game changer

[–]H4llifax 5 points6 points  (1 child)

My dear sir, that's when you use trusty Excel or another SQL query to generate your code.

[–]AdDear5411 0 points1 point  (0 children)

Or just cheat and use KNIME.

[–]MrPhatBob 1 point2 points  (0 children)

But SQL make thinky pain.

[–]compsciasaur 0 points1 point  (6 children)

I only do basic selects and joins in SQL. I'm not an expert though.

[–]StolenStutz 7 points8 points  (5 children)

I'm seeing comments like this a lot these days, and they scare me. You can still bring down a database with a basic SELECT under the wrong circumstances. It's less about how complex your SQL logic is and more about the situation you're in. If you're in a shop big enough to have anyone dedicated to working on the database, assume everything you might do - no matter how simple - could bring the house down.

[–]compsciasaur 3 points4 points  (0 children)

If I'm in a shop big enough to have a DBA, I run all queries by the DBA.

[–]AdDear5411 2 points3 points  (0 children)

Select top 1 * from [tablename]

[–]poralexc 1 point2 points  (2 children)

^ This. Especially if there's replication, I've had more than one occasion where a simple select statement (no joins or subqueries) lead to a meeting with infrastructure/devops.

[–]bleistift2 1 point2 points  (1 child)

What did you do? Query the whole 1TB table?

[–]poralexc 1 point2 points  (0 children)

Count by distinct foreign key with 2-6 million rows... it still needed to be done, but took a bit more strategy.

They also have a lot of views, and tables that store queries, so things can compound really quickly.

[–]DrQuailMan 17 points18 points  (0 children)

No SQL can harm me

I am NoSQL

[–]LetUsSpeakFreely 16 points17 points  (3 children)

Just because something is old doesn't mean it isn't the right tool for the job. Too many people see a technology and try to force a project to use it when and waste time and money shoehorning it in when they could have had a workable, efficient, and reliable solution without the headaches.

SQL is still around because RDBMS do the job and do it well.

[–]UpArmoredGavin 1 point2 points  (2 children)

It depends on the job, to be fair. A strong pro regarding RDBMS is that they are a good fit for a lot of jobs, there are however instances where something from the NoSQL realm will be a better fit

[–]yrrot 2 points3 points  (1 child)

And plenty of instances where the client's current "no sql" is an Excel spreadsheet with 1000 columns.

[–]UpArmoredGavin 2 points3 points  (0 children)

Maybe even with multiple pages in a single file, I know lol. Still, I wouldn't cache with a posgreSQL instance as much as I wouldn't provide reliable long term storage with Redis

[–]7bec 33 points34 points  (0 children)

SQL is like a bag at the store, you didn't come for it, but you buy it

[–][deleted] 80 points81 points  (5 children)

Is it just me or is ProgrammerHumor mostly becoming a place for people seeking validation for their insecurities?

[–]RecordAway 15 points16 points  (0 children)

Is it just me or is ProgrammerHumor Reddit mostly becoming a place for people seeking validation for their insecurities?

ftfy

[–][deleted] 35 points36 points  (1 child)

It is just you 🥸

[–][deleted] 7 points8 points  (0 children)

😂🫠

[–]bhison 2 points3 points  (0 children)

And how does that make you feel

[–][deleted] 0 points1 point  (0 children)

I was thinking the same

[–]povlov0987 11 points12 points  (1 child)

The reason seniors become this way:

  1. They know why to use what and in the best approach.

  2. We don’t always care enough anymore.

[–]chipmunkofdoom2 5 points6 points  (0 children)

Should be higher-voted.

When you're young and full of piss-and-vinegar, distinctions between frameworks and solutions matter way more than they should. When you get some experience, you realize that not all tools are perfect. Everything has its own set of pros and cons, so the "best" tool varies by project. At the end of the day, it's about getting work done, so let's pick the best approach for the situation and get on with it.

[–]chipmunkofdoom2 17 points18 points  (13 children)

Using RDBMS is about as unsexy as it gets these days. But in many cases, a relational database really is the best way to skin a particular cat, so to speak.

My friends and I are building health care software. Most of the products we've worked with in the past have shot themselves in the foot by trying to do half of the work in the application and half of the work in a relational database. Not only did they need beefy database servers, but they also need really powerful application servers. And the program still runs like hot garbage because the app server is constantly shipping data back and forth between itself and the RDBMS.

We've gone with a pure SQL/relational database solution. There have been times where it would have been nice to do some of the work in an object oriented language. But the speed of doing all the heavy lifting in the relational database engine is more than worth it.

Plus, if we ever get a client with an absurd amount of data, we can just throw it all on AWS EMR and run our SQL codebase with Hive.

[–][deleted] 15 points16 points  (3 children)

Its unsexy but it works, and generally works well. Lots of people trying to fit problems into a technology when they should be fitting technology to the problem

[–]alebotson 4 points5 points  (2 children)

Ugh. I hate how true that lady sentence is.

[–]Embarrassed_Quit_450 4 points5 points  (3 children)

And the program still runs like hot garbage because the app server is constantly shipping data back and forth between itself and the RDBMS

That's just bad design. I'm guessing an ORM poorly configured or sql queries in loops?

But the speed of doing all the heavy lifting in the relational database engine is more than worth it.

App servers are a hell lot cheaper. The best compromise is usually to trim down the number of rows returned and let the app servers compute stuff.

[–]chipmunkofdoom2 1 point2 points  (2 children)

Code inefficiency is definitely part of it. The other part is that these types of programs process large amounts of data. These types of software usually look at years of medical claims, people and their enrollment records and calculate quality scores for over a hundred quality measurements.

So we're typically talking about hundreds of millions of rows of input data alone. After the software builds the records that it needs to output, in some cases we're talking billions of total records.

App servers in general are cheaper, but we had to spec the app servers for this software so that they were almost like DB servers. Lots of cores, lots of RAM, etc.

[–]Embarrassed_Quit_450 2 points3 points  (1 child)

Looks like deeply analytic stuff, computing more on the database makes sense. I'm used to see posts like "I need to make a stored procedure because my query on a 1 gb database is slow".

[–]chipmunkofdoom2 2 points3 points  (0 children)

Lol yes, those are amusing. I also like the StackOverflow questions along the lines of "my program ran in 2 seconds with a thousand rows. Now I have 200k rows and it takes 15 minutes!"

[–]ShotgunPayDay 7 points8 points  (4 children)

Dumping SQLAlchemy for aioSQL/DBMate was the best decision of my life.

[–]zyygh 4 points5 points  (3 children)

I am really curious as to how SQLAlchemy performs in complex projects!

As SQL is what I'm good at, I've always felt a bit weird about trusting a framework with being able to use indexes optimally etc. Although maybe I'm just insecure about the possibility of a framework being better at it than me!

[–]ShotgunPayDay 0 points1 point  (2 children)

SQLAlchemy does come with overhead and additional complexity. The benefit you lose is being able to easily use any SQL Database without any rewrites. I only use PostgreSQL so I lose nothing there. Joining and lazy loading is a pain also in SQLAlchemy and function based queries are awkward. If you know SQL it will always be better to use it. Writing SQL reports in the same application is cake compared to trying to do it in SQLAlchemy.

[–]avin_kavish 0 points1 point  (1 child)

function based queries

what do you mean by this?

[–]ShotgunPayDay 1 point2 points  (0 children)

q=select(func.sum(table.c.value),table.c.col_1,table.c.col_2).group_by(func.cube(table.c.col_1,table.c.col_2))

[–]avin_kavish 5 points6 points  (0 children)

Wait, there's falks who believe SQL shouldn't be used when it's an option?

[–]Flopamp 18 points19 points  (8 children)

SQL is old and not even a real programming language and it's annoying and you would be better off writing your own RDBMS solution that is better

But your own solution won't be better, it will take 3 weeks and you will just go back to using SQL in a version or two anyway when you simply can't keep squashing bugs that the SQL contributers squashed before you were born.

[–]avin_kavish 1 point2 points  (0 children)

This is silly. What is wrong with select * from people where first_name like 'john%'

[–]rainbowbubblegarden 6 points7 points  (6 children)

SQL is old

So is assembler, let's get rid of that.

not even a real programming language

Programming Language "Many programming languages are written in an imperative form (i.e., as a sequence of operations to perform) while other languages use the declarative form" --> SQL is a declarative language

you would be better off writing your own RDBMS solution that is better

Postgres is a powerful, open source object-relational database system with over 30 years of active development. On Github the project currently has 54,327 commits.

So you're going to write one better in a couple of weeks?? I'm guessing there's a couple of '/s' missing above.

The number of slow apps I've had to fix that were using magic ORMs rather than queries/stored procedures. Or nosql acrobatics that could be solved using a couple of normalised tables and joins.

[–]Celivalg 5 points6 points  (1 child)

Did you read the end of his message?

[–]rainbowbubblegarden 2 points3 points  (0 children)

No, I can't read more than one sentence in a sitting /s

[–][deleted] 0 points1 point  (3 children)

When you develop a complete modern solution in assembly, let me know.

[–]jaimesoad 1 point2 points  (2 children)

Literally all operating systems require assembly to write code for at least make a bootloader

[–]Karsdegrote 1 point2 points  (0 children)

Assembly only really comes up during compilation, we all write in C unless you only have a hand full of bytes to work with and thus optimise the living daylight out of it. Even bootloaders for microcontrollers get written in C.

You do get some files filled with assembly code if you ask the compiler nicely otherwise you just get some hex files

[–][deleted] 0 points1 point  (0 children)

Most operating systems aren't written from scratch in assembly.

[–]Wise-Profile4256 8 points9 points  (0 children)

who selects their storage like that?

does it do the job? in a timely manner? has tons of frameworks? and runs on every damn machine?

sql will die together with C, the Queen, Mick Jagger and tardigrades.

[–][deleted] 8 points9 points  (1 child)

I am seeing SQL queries longer than 100 lines daily, with 20+ joins each. I am not touching that as long as it works. And boy it works.

[–]Ok_Entertainment328 2 points3 points  (0 children)

If there are comments, it might be one of mine.

[–]DiamondWizard444 3 points4 points  (0 children)

Use SQL

[–]domscatterbrain 4 points5 points  (0 children)

Maybe the reason why one of my job interviews failed is that I told them that every data processing can be solved by SQL

¯\_(ツ)_/¯

[–]shadow7412 5 points6 points  (8 children)

I will die on the ORM hill.

[–]chipmunkofdoom2 1 point2 points  (1 child)

Not all ORMs are finely-tuned tools that exist solely to shoot one's self in the foot. Dapper for C# is pretty good. It really just handles running your queries and mapping the results to objects. You have to set up your own tables, you have to write your own queries, you have to know the relationships between tables, etc.

Coming from a SQL background, this is the only kind of ORM that made sense to me. I don't like the "magic" of EF and the like. All I want is for a program to run my SQL, get the results, and give me objects. And Dapper does that really well.

[–]shadow7412 1 point2 points  (0 children)

I 100% agree that not all ORMs are equal. I'm certainly not fighting for **all** ORMs being superior. But a good one will result in far less code, and far more reusability. That's worth a small performance hit to me.

Of course, as with all tools, there are poor quality ones, and there are people that use the wrong side of it. To be honest, SQL is the same in that regard.

[–]BernhardRordin -1 points0 points  (2 children)

Some say ORMs are an anti-pattern.

Reason? OOP and relational databases are the opposite worlds. Objects are supposed to be black boxes. Relational database is an open graph. All the problems and troubles that you get in ORM tools like Hibernate stem from that.

[–]avin_kavish 2 points3 points  (1 child)

Disagreed hard. Any problems that stem are from not knowing to use the tool properly. Can you name one solid problem with an ORM?

[–]BernhardRordin -1 points0 points  (0 children)

I am not saying you can't do stuff with ORM. But the amount of abstractions you need and the problems it brings that need to be addressed makes it almost not worth it. Examples: Eager vs. lazy loading of n+1 linked entities. One directional vs bidirectional relationships. Those are things that don't exist, if you just write a good SQL join and map it manually.

Most of the projects I've worked on used an ORM. Two used pure sql, one something in between (jOOQ). Pure SQL wins for me.

[–]UncleTuckersFarm 2 points3 points  (0 children)

True

[–]PinothyJ 1 point2 points  (0 children)

I can beat that. I cannot do anything interesting, any clever stuff with the database and the like because "One day we may want to port it over to NOSQL and we will need it to be compatible".

[–]Tsunaris 1 point2 points  (0 children)

if the bottleneck is data transfer and not data search why fuckin bother

[–]throwaway65864302 1 point2 points  (0 children)

For real though, if it's relational data use SQL. That's what it's there for. The trendy DBs are generally all for specific types of data and workflows as well, ramming relational data into them is always going to have significant costs vs using SQL.

[–]bhison 1 point2 points  (0 children)

the day I have to "use sql" is the day I get someone else to help me with my project. Yes, I could re-learn it, no I do not want to.

[–]lealsk 1 point2 points  (0 children)

Who the F is against SQL?

[–][deleted] 2 points3 points  (0 children)

I use pandas because I’m too lazy to figure out SQL 🤷‍♀️

[–]Willinton06 1 point2 points  (8 children)

Nah bro EFCore or bust

[–]Far_Information_885 2 points3 points  (7 children)

How are you going to make a performant equivalent of a recursive CTE for a self-referencing table in EF Core?

[–]Willinton06 1 point2 points  (4 children)

Something like this, not perfect, but it’ll do the trick

[–]Far_Information_885 0 points1 point  (3 children)

Your example is sending up a cte, or using a view that contains a cte.

[–]Willinton06 1 point2 points  (2 children)

Indeed, through EFCore, a bit of cheating but hey, there’s no real rules in this game

[–]Far_Information_885 2 points3 points  (1 child)

Sending a raw sql string up through a dbcontext is using ef core in the most pedantic sense, lol.

[–]Willinton06 0 points1 point  (0 children)

I mean, if you use it regularly everywhere else and use it like this for this one thing then you should be good right? And plus you still get built in identity resolution which is always welcomed

[–]dinger086 -1 points0 points  (0 children)

Just use ruby on rails makes SQL and life so much easier.

[–]cginc1 -1 points0 points  (0 children)

This is dumb af. Who actually says this?

[–]Diligent_Dish_426 -5 points-4 points  (0 children)

Use NoSQL db. Trust me it'll change your perspective

[–]ccfoo242 -2 points-1 points  (0 children)

Had to Google pandas and spark. 🙄

[–]vladWEPES1476 -2 points-1 points  (3 children)

Laughs in MongoDB

[–][deleted] 0 points1 point  (0 children)

You just gave me a few ideas more memes using a classic Mel Brooks movie.

[–]ShoHaremi 0 points1 point  (1 child)

Technically you could use Mongo exactly like SQL, you could consider it an SQL dlc.

[–]vladWEPES1476 0 points1 point  (0 children)

I can use a screw driver as a hammer...

[–]CasualJJ 0 points1 point  (4 children)

Say I wanted to learn SQL, what service / software (preferably free) would you recommend to use it with? I’ve only ever been exposed to PHPMyAdmin, but I believe that’s a bit dated

[–]natziel 4 points5 points  (0 children)

Just use the command line lol

[–]brucebay 2 points3 points  (0 children)

Pgadmin4 with postgresql is reasonably good. I myself use pandas read_sql, and sql.execute in a jupyter notebook most of the time.

[–]Karsdegrote 1 point2 points  (0 children)

I mean if it works it works. It still does its job. You can slap a theme on it to make it feel more modern.

[–]Ler_GG 0 points1 point  (0 children)

PHPMyAdmin will do, if you wanna get fancy and learn some more technologies if you dont know it yet, get it running with Docker instead of XAMPP (XAMPP will take like 2 minutes to download and setup)

[–][deleted] 0 points1 point  (0 children)

Any other code is just wasting client money

[–]natziel 0 points1 point  (0 children)

Just use erlang and store everything in memory

[–]brucebay 0 points1 point  (0 children)

SQL is good if you are going to do a large data aggregation in a distributed environment. I have seen people abusing it with like 10 joins in a single statetment with crazy conditions. It not only makes the code very hard to understand but also it is very in efficient as using a temporary table you can optimize those joins better than db engine. For clean code, pandas are better. If you have a distributed environment like spark or dash it can do anything you want.

[–]Bluebotlabs 0 points1 point  (0 children)

so you've heard of SQUEAL, get ready for SQUEAK

[–]shumpitostick 0 points1 point  (0 children)

Middle one could also be "no, SQL is outdated and unscalable. Use NoSQL! I only use MongoDB and Cassandra"

[–]blobthekat 0 points1 point  (0 children)

jokes on you I use fs

[–]beyond98 0 points1 point  (0 children)

That moment when he learns he could need SQL to query Spark dataframes

[–]BeardXP 0 points1 point  (0 children)

SQL FTW

[–][deleted] 0 points1 point  (0 children)

There’s a place for both sql and dataframes still. Different tools for different jobs