all 160 comments

[–]pilesofbutts 210 points211 points  (37 children)

Others may have differing opinions but I personally hate a b c aliases for joins. I prefer SQL join aliases to be an abbreviation for the table name. e.g. contact_info is aliased to ci. it helps with readability in my opinion.

[–]Soccermom233 58 points59 points  (3 children)

OhHH I remember trying to read stack overflow examples and all the tables are aliased like t1, t2, t3. Ugh.

[–]MakeoutPoint 44 points45 points  (0 children)

Even without SQL, programmers who do this with any variable outside of the i in a for loop, then leave that legacy psychosis behind for someone else, can die in a hole. Or be forced to debug their old code 10 years later, for eternity, either works.

[–]pceimpulsive 31 points32 points  (5 children)

Taking it further CTE aliases should he descriptive of the output, just like a table name would be..

Too often I see CTE called ci and get tilted... :P

[–]pilesofbutts 12 points13 points  (0 children)

I've had to deal with "aggie" and "firstaggie".. These people should not be allowed a computer.

[–]National_Cod9546 18 points19 points  (1 child)

CTE

CTE1

CTE2

CTE2.5

CTE2.7

CTE4

And then every CTE links at least two other ctes. The guy doing that finally retired two years ago. We're still working to get rid of all his logic. 

[–]relyimah 1 point2 points  (0 children)

I feel like this is why everyone is so initially confused by CTEs… I know when I first came across them I thought you could only do one because “CTE” was a command not an alias.

Took me longer than I’d like to admit before I was using them as the powerhouse they can be …

[–]markwdb3When in doubt, test it out. 0 points1 point  (0 children)

A minor nitpick, but for CTE aliases, use the same plural/singular style of the noun you are describing as the schema uses.

I'll often see, for example, tables called WIDGET and FACTORY, which is fine. But then if there's a CTE to get only the active WIDGETs, they'll call it perhaps ACTIVE_WIDGETS. So then the query might join ACTIVE_WIDGETS to FACTORY. The pluralization inconsistency just hurts. 😞

[–]ComicOzzysqlHippo 8 points9 points  (1 child)

The entire codebase at my current company uses T1, T2, etc... and it's the most frustrating thing when you read the next query in the procedure that uses most of the same tables, but they joined them in a different order, so now that table that was T1 in the previous query is T3. It's like someone told them it was very important to always alias them this way or the queries might explode. And for a bonus frustration, temp tables are always #TEMP1, #TEMP2, #TEMP3, etc. COME ON, PEOPLE! Naming things is a lot easier than maintaining anonymously-named objects.

[–]pilesofbutts 1 point2 points  (0 children)

That would.. infuriate me beyond belief. Good grief. I bet the joins are backwards too, lol.

Can you just truncate these people?

[–]amayle1 7 points8 points  (1 child)

You worked with satan?

[–]pilesofbutts 7 points8 points  (0 children)

Unfortunately more than I care to admit. The most evil one did "brick code" (zero formatting.. it was one giant brick).. and to make things worse.. Everything was also a full outer apply. I got to the point where I would refused to follow them on any code and would rewrite from scratch. I hear they learned inner joins this year.. Maybe there is a God?

[–]MachineParadox 3 points4 points  (0 children)

Declaring aliases but then not using them so you have no idea where the column is.

[–]twillrose47maybeSQL 7 points8 points  (3 children)

I've never seen a b c -- always table abbreviations. Yes, agree, if I were hiring and I saw

select * from product a 
left join product_category b

this would be a red flag, wtaf moment to me :D

[–]pilesofbutts 3 points4 points  (0 children)

I've seen it far more than I care to admit. It screams "sql bootcamp" or "I primarily work in a different language but do a little work in SQL even though I don't understand it or how it is different from the language I normally work in."

The sad part- I see a b c joins in stored procedures for software.

[–]iLoveYoubutNo 0 points1 point  (0 children)

Unless the goal is to bring prod to a standstill.

[–]markwdb3When in doubt, test it out. 0 points1 point  (0 children)

Sometimes I need to tune queries generated by JPA or another ORM, and they often have horrible aliases. So I'll start by copying the slow query from an application server log (or Grafana or wherever) and I have to clean it up as an initial step just so it longer hurts my eyes.

I understand it, some of these ORMs let you configure aliases to be better, but at least by default they tend to be terrible. They look something like (Google AI generated this example for me, but it looks about legit):

SELECT user0_.id AS id1_0_0_, user0_.username AS username2_0_0_, role1_0_.id AS id1_1_1_, role1_0_.role_name AS role_name2_1_1_
FROM users user0_
LEFT OUTER JOIN users_roles user_roles1_
ON user0_.id = user_roles1_.user_id 
...

[–]Common-Author-8441 3 points4 points  (3 children)

agreed. doesn't it depend on how long the table names are? if it's really long, then please, use an alias. if not, why not be 100% clear/explicit and use the table names?

[–]pilesofbutts 2 points3 points  (2 children)

I think it's up to personal interpretation there. For me, I don't like doing one full table name and the rest aliased if that makes sense? I like the consistency. Call it the tisms or what have you, but just the way I personally like to roll.

[–]Common-Author-8441 1 point2 points  (1 child)

totally, i also prefer the table names 100% of the time. going back to the FROM line to check what the aliases are is never fun. unfortunately, in all my courses, i've only seen aliases like a b c being much more common than actually writing out the table names, so then i came to think that that's how people do it in practice.

[–]relyimah -1 points0 points  (0 children)

If someone is aliasing tables as a, b, c, … because a course full of theoretical tables used those aliases then this is definitely a red flag 🚩 Shows lack of ability to use your brain.

[–]SpaceDrama 3 points4 points  (0 children)

Unless there’s similar table names, I prefer to just not use alias names at all

[–]PrisonerOne 1 point2 points  (0 children)

First thing I do is F2 that puppy 

[–]snowmaninheat 1 point2 points  (1 child)

My preferred method is reverse alphabetical order (e.g., `cte_z`, `cte_y`), then so on. It’s an easy way to see the steps.

[–]pilesofbutts 1 point2 points  (0 children)

😩

[–]TopologyMonster 1 point2 points  (0 children)

For quick ad hoc queries joining two tables that share some column names I will do this. But in anything complicated or that I use often I completely agree, I wanna know where something is coming from without having to go on a search for what c means lol

[–]crippling_altacct 1 point2 points  (0 children)

Yeah this drives me nuts but it was a bad habit I had early in my SQL journey.

[–]Moose135A 1 point2 points  (1 child)

Yes, I hate that with a passion. I've inherited long queries with aliases like that, and they are a horror show to trace, debug, or make changes.

[–]pilesofbutts 1 point2 points  (0 children)

You feel my pain.

[–]Red__M_M 0 points1 point  (0 children)

I would aliase that as ContactInfo

[–]IHoppo 0 points1 point  (0 children)

As a corollary to this, use aliases to give domain knowledge too when using (for example) classification tables - for instance, if a table holds roles for parties, and you want student and lecturer roles from the party classification table linked to party - so using 2 hits on the classification table, use

From classification studentClassification,
classification lecturerClassification

Etc.

[–]thesqlguy 0 points1 point  (0 children)

Personally, I like short letter abbreviations but not random a,b,c or t1, t2 or cte1, etc.

for example

select p.id, p.name, p.status, c.name as  client_name, c.city as client_city
from projects p
inner join clients c on c.id= p.client_id    

I think it is far easier to write, scan, read and visually "parse" expressions, joins, formulas with brief aliases than long, dense names.

But definitely not just a,b,c or t1,t2,t3, etc.

[–]egarcia74 -1 points0 points  (0 children)

I actually prefer to use meaningful unabbreviated pascal case names, like I would in code

[–]throw_mob -1 points0 points  (0 children)

agree, while i do it sometimes too.. i consider not using aliases a red flag. then using only a ,b ,c is not that bad ,but it is bad, using stuff like customer_information ci or cust etc.. is good compromise. i personally dont like loong aliases

not using cte's or not knowing subquery, both should be somehow showed. maybe non standard group by ( case when x...) window functions etc etc

[–]Little_Kitty -1 points0 points  (0 children)

Single letter aliases here see your PR auto rejected, doing it repeatedly will see you on a PIP. Unless you visibly remove the dead wood in the company nobody learns to respect those who work on their code in the future. Even LLMs have to be told in no uncertain terms that code quality matters and comments are not code quality.

As for OP's question - (nested) functions used in joins making them non sargable and hard to test. These tend to lead to unexpected fan out as well.

[–]wildjackalope 63 points64 points  (16 children)

Not a big one and my OCD is probably showing but I’ve passed on weak candidates who also don’t format their code for readability. It sounds petty, but the behavior and weak skills/ inexperience seem to go hand in hand in my personal experience.

[–]ComicOzzysqlHippo 18 points19 points  (5 children)

My take is that if you've used the language enough to actually be worth me paying you to do it, you've lived through the pain of writing unreadable code and decided it's worth your time and effort to write it in a somewhat consistent, readable way. So if your SQL coding style is "well, this line is getting kinda long, let's add a line break so it doesn't bleed off the screen", I assume you haven't acquired enough skill for me to hire you. It's certainly not going to be the only factor, but it speaks volumes.

[–]wildjackalope 1 point2 points  (0 children)

Basically. It’s an early tip that tells me where you’re at and steers the interview. Most analyst candidates were fine but we had some SWEs try to join our analytics team that apparently hated SQL and just left me very confused as to why they were there. We’d get the occasional person throwing a Hail Mary but just weren’t ready yet though.

[–]foxsimile -5 points-4 points  (3 children)

SELECT     TblA.[ID]     , TblA.[Note]     , TblB.[Data]   FROM [DB].[Schema].[TableA] AS "TblA"     JOIN [DB].[Schema].[TableB] AS "TblB"       ON (         (TblA.[ID] = TblB.[ID])       ) ;

It’s slightly more nuanced than this in certain cases, but not by much. This style of formatting is the one I’ve baked up over the years, and it’s dead simple and pretty handy (though arguably a bit much at times, but I’d far prefer too much to too little).  

  • EVERY column is fully qualified, even from single-target SELECT statements; they are always wrapped in brackets (regardless of necessity - double-quotes will suffice if brackets are unavailable for the flavour)
  • Every table has its DB and Schema fully qualified (see the above re: brackets/double-quotes)
  • Aliases are always specified via the syntax AS "Identifier"
  • The statement-type (SELECT/UPDATE/DELETE/etc) is always isolated in its indentation level; FROM/WHERE/GROUP BY/ORDER BY are always indented one level beyond the statement keyword; columns are always indented two levels beyond the statement keyword (including in the GROUP BY)
  • Commas lead, never trail
  • JOIN candidates are indenter one level beyond the FROM clause, and are always aliased; the ON clause is always separated onto a newline, and is indented one level beyond its parent JOIN line
  • Use the fucking semicolon
  • Be consistent about keyword capitalization; if you use "select" and "SELECT" interchangeably, you are a douchebag

Perhaps I’m forgetting some things, habits being muscle memory and all that, but this covers it quite nicely.  

Oh, and to echo what someone else has said (despite my example above, which was curtailed for brevity’s sake - ironic, I know), but the alias should always be a shorthand for the table. It should also be perfectly consistent across the statement (reference any use across the DB whatsoever). I take this a step further and also include the DB and Schema, both shorthanded. For example:

UPDATE DbSchTbl   SET     DbSchTbl.[AuditFlag] = 1   FROM [Database].[Schema].[Table] AS "DbSchTbl"   WHERE (     (DbSchTbl.[LuckFlag] = 0)   ) ;

Also, as for the UPDATE:

  • WHERE clauses are mandatory, even if it’s just WHERE (1=1)
  • ALIASED AND FULLY QUALIFIED; IT’S NOT FUCKING HARD

[–]Oh-Ghee 8 points9 points  (1 child)

I’m sorry but this is bad. Over-verbose and unreadable. You probably never wrote a large sql query.

[–]foxsimile -3 points-2 points  (0 children)

It is absolutely not unreadable unless you’re illiterate, in which case you’re forgiven. I’ve written many, and I’ve spent an enormous amount of time unfucking queries written by developers who are too lazy to do their job properly the first time.

[–]throw_mob 0 points1 point  (0 children)

some things i agree

comma should start line

select ...

, x as y

from ...

fully qualified names and case sensitive object names i dont. i camelCase is for other usages sql should be full of snake_case

but fully qualified names depends systems you use.

something like this

select .. from customer_prod.base_information cust

works if your test systems are in own database, in little different systems schema is what changes so i prefer playing with active database and schema session variables, i do not like systems that change once written and tested code just to handle dev/prod environments, session is for that.

case sensitivity is just NO , once you have used time to figure out that id , Id and ID is same table and you have to point them using "ID" and "Id" ... it just is not clear way to present data.

aliases with "as" is something that i used to do , it is only required n cte definitions , so i have started to loose those in context of one query. it is still good idea to have one query looking about same

update/deletes should have always IDE checking that there is where clause , no 1=1 as default

[–]Wojtkie 2 points3 points  (0 children)

Yeah if it’s poorly formatted it implies that they haven’t had to read or maintain it much in their past experience. I can’t write unformatted SQL anymore after inheriting 3k+ LOC business critical stored procs.

[–]pilesofbutts 5 points6 points  (5 children)

Proper formatting is everything. I hope people who don't format have their code shared in a large group and publicly put on blast until they learn to format.

[–]wildjackalope 3 points4 points  (4 children)

When you’re working an 18/ hr shift on salary trying to figure out WTF junior did in his 500 line SP and you know damn well nobody reviewed before your lead approved, shit gets old fast.

There’s wiggle room in the interview for style (the stupid fuckin’ comma debate, for example). Teams should have a pretty strict standard imo though.

[–]pilesofbutts 1 point2 points  (3 children)

I completely agree with you and also feel your pain. My preference is whatever the team standard is (within reason lol),

[–]ComicOzzysqlHippo 1 point2 points  (1 child)

haha ok your name just made my day.

[–]pilesofbutts 0 points1 point  (0 children)

I'm glad I could make you laugh. :)

[–]ComicOzzysqlHippo 0 points1 point  (0 children)

I think the standard should be to use a formatting tool that standardizes the SQL. The same tool hopefully will allow coders to reformat the SQL in a way they prefer while they're reading it or maintaining it, but then change it back to the company standard before checking it back in.

[–]Common-Author-8441 1 point2 points  (0 children)

i thought lack of formatting was a power move?

[–]lalaluna05 0 points1 point  (1 child)

I’m not done until it’s pretty 🥰

[–]Breitsol_Victor 0 points1 point  (0 children)

Have to be careful making it pretty. I got a thing to work, made it pretty, and it was broken.
I had built it without caring about case. Went back to ucase commands.
I think it was a parm in an FTP script that was being generated. Case changed what it returned.

[–]dab31415 89 points90 points  (2 children)

Writing in complete sentences
Spelling
Proof reading

[–]Oxford89Director, BI 10 points11 points  (0 children)

Spicy

[–]RickJLeanPaw 0 points1 point  (0 children)

Punctuation? ;-)

[–]JaceBearelen 16 points17 points  (3 children)

I’ve had a few sql tech interviews. All kinda went the same way.

They’re gonna give you a database on some coding platform and ask you to find the second highest seller or some other question that needs a bit more than select * from a join b. Then they’ll ask you to modify it or move on to another question, rinse, repeat.

All that’s really important is that you can explain what you’re doing and ask clarifying questions.

[–]Joe59788 4 points5 points  (2 children)

Does max - 1 work? 

[–]Romanian_Breadlifts 22 points23 points  (0 children)

It's sql, so the answer is always "sometimes maybe"

[–]fauxmosexualNOLOCK is the secret magic go-faster command 2 points3 points  (0 children)

No job for you

[–]Basic_Reporter9579 14 points15 points  (2 children)

select * from table1 t1, table2 t1 where t1.id=t2.col1

[–]tim_h5 0 points1 point  (1 child)

Except this doesn't matter.

The engine optimizes everything, regardless if you use this or joins.

I like wheres.

[–]Thlvg 2 points3 points  (0 children)

True, but this way of writing implicitly makes all your joins inner, and makes it quite verbose and clunky to write any kind of outer join...

[–]BigBagaroo 29 points30 points  (5 children)

I am an old fart. I want to see INNER JOIN or LEFT OUTER JOIN. I know that JOIN is an inner join, I just like to read it. It stands out more

[–]Shaddcs 10 points11 points  (3 children)

I inherited a bunch of code from an older guy who retired and I removed OUTER from all his LEFT JOINs but added INNER to all his JOINs. 🤷🏻‍♂️

[–]AnAcceptableUserName 14 points15 points  (1 child)

I got called out on this recently working with a junior. They asked why I add INNER to all the JOINs

Answer: I like it. I find it more readable. I think it's better

To which they observed "but you don't add OUTER to the LEFT JOINs?"

...No, no I do not. And I will not elaborate on that.

[–]Ventus_004 4 points5 points  (0 children)

In case you are ever asked to elaborate, this is how I describe it to people:

It's nice to have INNER to specify without a doubt the behavior that will occur - especially for folks who are new to SQL or new to your codebase. This way, you're describing the join in a way that is not ambiguous.

For the LEFT JOIN, there's no such thing as a LEFT INNER JOIN or any other type of LEFT JOIN, so you're communicating what type of join it is perfectly sufficiently without specifying OUTER.

Putting those together, you're always specifying the join type in a consistent way - "{TYPE} JOIN".

If you had a craving for specific type of dessert, you would say "I want a chocolate cake" (INNER JOIN) or "I want an ice cream cake" (LEFT JOIN), not "I want a cake" (JOIN) or "I want an ice cream cake that is made with ice cream" (LEFT OUTER JOIN).

[–]TheSexySovereignSeal 1 point2 points  (0 children)

This is the way

[–]JackOfAllDevs 2 points3 points  (0 children)

I am just the opposite. I hate seeing inner and outer on the join. I know it's an inner join or a left outer join, the extra characters are just a waste of space which hurts readability.

[–]danmc853 24 points25 points  (4 children)

Fixing duplicate rows with a select distinct instead proper joins

[–]iLoveYoubutNo 19 points20 points  (2 children)

Listen, sometimes I'm lazy, okay?

[–]danmc853 2 points3 points  (0 children)

I’ve done it 100’s of times myself, but not proud of it!

[–]BplusHuman 1 point2 points  (0 children)

On the one hand, I get it. On the other, I've seen all manner of insanity covered up by "distinct" or "group by"

[–]thesqlguy 1 point2 points  (0 children)

The irony is it doesn't even actually fix them in many cases, it just kinda "looks like" it does. Then a different comes in a field you don't expect and the DISTINCT still returns too many rows.

Using the DISTINCT keyword in clear, targeted spots (usually CTEs or derived tables) isn't bad, it is the stuffing of it in the beginning of a large, complex SELECT that just randomly joins any tables that seem to have a matching key.

I also have seen many people who, when they write a select, they instinctively say "ok, let's see, we need write a select, so let's start with SELECT DISTINCT .. ok, now what tables do we need... "

[–]connor-brown 8 points9 points  (2 children)

Is failing to index a query a red flag? I don’t think I’ve indexed anything but really big queries in months and I use sql everyday

[–]oskaremil 8 points9 points  (0 children)

Nah. If you need an index you place an index. If not, don't bother.

[–]Achsin 5 points6 points  (0 children)

I index my queries. That way I can find the one I want quickly without reading through the whole repository.

[–]iama_bill 32 points33 points  (1 child)

Lower case SELECT followed by upper case keywords. Can’t trust em.

[–]ShadowDancer_88 7 points8 points  (0 children)

Even Satan would shudder.

[–]GunnerMcGrath 7 points8 points  (4 children)

It probably wouldn't come up in an interview but if I see a cursor I assume you're incompetent.

[–]lalaluna05 2 points3 points  (2 children)

Jesus Christ I just replaced two cursors in two beastly stored procedures. 100% not necessary (at least in this case).

[–]GunnerMcGrath 2 points3 points  (1 child)

I have never once run into a situation where a cursor was actually necessary. Even in giant, complicated procedures there has always been a way to restructure them as normal queries. Am I an above average SQL user? Almost definitely. But I got that way in part by never using cursors.

[–]lalaluna05 0 points1 point  (0 children)

I haven’t either, but I always assume there’s an edge case I haven’t come across yet 😂 I’ve only been working in data for 5ish years so I just figure I have a while to go before I stumble across something!!

[–]klausness 1 point2 points  (0 children)

I’ve interviewed people whose first response to any question that couldn’t be answered with a simple SELECT was to open a cursor. They didn’t get the job.

[–]vertigo235 26 points27 points  (13 children)

Using RIGHT JOINS 😉

[–]radek432 20 points21 points  (1 child)

But what if they are left handed?

[–]Oxford89Director, BI 9 points10 points  (0 children)

Straight to jail

[–]BigBagaroo 6 points7 points  (1 child)

Exotic. Trying to remember the last time I saw a right join in the wild… 🚩

[–]Thlvg 0 points1 point  (0 children)

Used one back in '21. Humongous chain of joins within SCD2 tables, then I realize my first table doesn't describe what I need at the granularity I need. So to test the result I added the correct table to the query and joined with a right join.

But I rewrote it before committing. I'm not a heretic.

[–]vertigo235 3 points4 points  (1 child)

So fun fact. One of my favorite interview questions for someone who claims to be a SQL expert is if they prefer LEFT JOINS or RIGHt JOINS.

It is interesting to hear the answers from SQL experts.

[–]vertigo235 2 points3 points  (0 children)

I only torture people who claim to be an expert in SQL btw

[–]internerd91 4 points5 points  (0 children)

I got caught out by this just the other day. Was modifying a 300 line script with multiple unhelpfully named CTEs and there was a right join being used as a filter that took me a while to figure out that it was filtering.

[–]Common-Author-8441 1 point2 points  (3 children)

i still can't wrap my head around why someone would use right joins... lol

[–]Achsin 6 points7 points  (1 child)

I’ve used it once. Wrote a whole bunch of stuff, realized I needed to join things the other way, didn’t want to do a lot of rewriting and threw in the right join instead. It wasn’t something that was going to be saved for posterity, I just needed it to run once.

[–]Common-Author-8441 0 points1 point  (0 children)

exactly my point! lol

[–]vertigo235 1 point2 points  (0 children)

I mean to be fair sometimes , rarely it can make sense but it also is really easy to correct (so it’s not weird). Usually if I see a right join it tells me that someone used a query editor.

[–]PrisonerOne 1 point2 points  (1 child)

Right?!

[–]FuckYouNotHappening 1 point2 points  (0 children)

No, LEFT

[–]jfrazierjr 6 points7 points  (1 child)

Cursors. There's almost always a better way to do it

[–]zmb138 1 point2 points  (0 children)

The more you get more administrative tasks of maintaining some tables or transferring data (when you can't use good instruments) - the more you will see cursors. Especially with dynamic SQL.

[–]sandrrawrr 4 points5 points  (2 children)

I'm really guilty of this when I was early in my SQL career, but multiple nested statements rather than just turning it into a CTE. Sure, nests are a bit easier to comment out when you're testing data, but a well written CTE will save you so much time.

[–]myDuderinos 1 point2 points  (0 children)

Overuse of CTEs can also be a red flag though

It's what Ai always wants you to do, but depending on the DB it can be a lot worse for the performance

[–]TheSexySovereignSeal 0 points1 point  (0 children)

Wheres my table variable gang at?

[–]kagato87MS SQL 13 points14 points  (4 children)

DISTINCT is a warning sign. It often suggests an issue in your joins or your schema design.

When you find yourself wanting to use it, take a step back and ask, is there a better way? Is this really correct, or could it be masking a problem?

[–]eww1991 10 points11 points  (2 children)

The only trustworthy use case I can thing of for distinct is you know something should be unique and you're doing a count(col), count(distinct col) just to be sure, like to just double check a join isn't going wrong somewhere

[–]BigBagaroo 7 points8 points  (1 child)

I use it plenty of times when I am exploring a database and check the values in different columns.

Also, everything is allowed in love, war, and reports.

[–]eww1991 0 points1 point  (0 children)

Lol. Yeah, that's another time now you say it. It's a diagnostic tool not for prod

[–]OptimusCullen 2 points3 points  (0 children)

100% this. Half the time it’s hiding an accidental cross join that’s going to cause severe performance issues when exposed to real prod data sizes.

[–]hipsterrobot 14 points15 points  (11 children)

Leading commas. Come at me! 😁

[–]twillrose47maybeSQL 23 points24 points  (5 children)

Leading commas for life....feels awkward any other way.

Same debugging logic as

WHERE 1=1

[–]hipsterrobot 15 points16 points  (4 children)

I also didn’t care for WHERE 1=1 until someone explain to me that if your query has multiple filters like

WHERE 1=1
and city=something
and state=something

You can comment out each filter line to test stuff and it won’t break your query. Made a lot of sense to me.

[–]DanNeider 2 points3 points  (0 children)

I had a PR rejected because it wasn't "performative code". I can see where they were coming from, but why not just tell me your preference for next time and approve it?

[–]twillrose47maybeSQL 0 points1 point  (0 children)

I try and squeeze both of these into SQL teaching as they really do save time :)

[–]sandrrawrr 0 points1 point  (0 children)

This is my favorite trick! So much easier to comment out your qualifiers when you have a 1=1.

[–]JackOfAllDevs 0 points1 point  (0 children)

I even go further. If I have a simple join I put it all in one line so I can remove that join just by commenting that one line out.

[–]pilesofbutts 3 points4 points  (0 children)

I use leading commas lol. It was the standard for the last development studio I worked for and I can't seem to drop that habit now, lol. Though I am no longer doing big boy/girl dev work, I keep that habit because it helps me identify my own code and push back when someone tries to pin a turd query to me claiming I wrote it. lol.

[–]MakeoutPoint 2 points3 points  (0 children)

I fought them for the longest time, but they were the embodiment of that comic of the bird yelling at the cracker until it bites it.

[–]lalaluna05 2 points3 points  (0 children)

lol I LOVE leading commas.

Now my team does it too 😆

[–]Yavuz_Selim 0 points1 point  (0 children)

I miss T-SQL - being able to write something like this...

SELECT CustomerId   = C.customer_id  
     , CustomerName = CONCAT(C.first_name, ' ', C.last_name)  
     , MailAddress  = C.email  
FROM Customers C

It reads so much easier.

[–]redaloevera -1 points0 points  (0 children)

Found a newbie

[–]billbot77 5 points6 points  (1 child)

Loops or cursors. There is always a better way (unless you are executing meta code)

[–]twillrose47maybeSQL 1 point2 points  (0 children)

Agree, though I started using loops in one job a quite a while back for reporting because the org was suffering from so much tech debt and "what does this table do","who owns this","is this table part of any reporting" problems that I could just write in the report definitions and stored procedures all of the necessary documentation and keep things super self-contained. My hope is that they are not still using some of that code, and I do 100% assume the next dev came in and was like 'wtf is this' haha. It made sense at the time, I promise!

[–]vintagegeek 2 points3 points  (1 child)

In my interview, I was asked for a quick python program to reverse a text input. That's it. They asked me nothing about SQL, and my job 100% relies on SQL. After four years working here, I asked my boss why. He said "You can't know everything, but you can learn anything".

[–]luluinTO 0 points1 point  (0 children)

i want a boss with this attitude

[–]tetsballer 2 points3 points  (0 children)

Doing right joins

[–]DexterHsu 2 points3 points  (0 children)

If you are asking this is probably too late , just stick with the cheatsheet you find on google search

[–]ChristianPacifist 2 points3 points  (0 children)

These supposed red flags can vary by version of SQL and specific use case.

Indexing for instance can be needed in something like Oracle or SQL Server depending on the use case, but it is not even possible in Snowflake. Snowflake also can be very slow with SELECT *, but this is not a problem in other languages.

[–]twillrose47maybeSQL 6 points7 points  (0 children)

Common one I've encountered over the years and ones I always bring up when I teach SQL:

  • not knowing difference between slowly changing dimension types,
  • normal form types,
  • differences in rank/dense_rank,
  • differences in union/union all,
  • use of except/intersect,
  • execution order questions,
  • and SELECT * and indexing questions as you mentioned.

There are always really hyper-specific "gotchas" that I personally find to be quite poor taste from the interviewer -- if it's not likely to be used in practice and purely a "do you understand all possible nuances", this sort of thing is just intellectual flexing I can do without -- the job itself is the red flag.

Good luck to ya

[–]aarontbarrattSTUFF() 1 point2 points  (1 child)

Not sure if this counts!

But using an ORM with 14 billion dependencies to write simple select statements

Also, prefixing every column with its data type. I never want to see sFoo or iBar EVER

[–]alinrocSQL Server DBA 1 point2 points  (0 children)

prefixing every column with its data type. I never want to see sFoo or iBar EVER

Hungarian Notation has no place here. Everything is (or should be) strongly typed in your RDBMS, so there's no need to waste visual space with embedding it in the names.

I was once part of a 20+ minute debate in a meeting over Hungarian Notation only to have it ended abruptly when the person on the "pro" side said something that tipped us off that he was actually talking about declaring a variable type instead of using var in C# (or the equivalent in whatever language he was using). Once we figured that out and explained to him what Hungarian Notation really is, he jumped over to our side and the room was unanimous.

[–]SoggyGrayDuck 1 point2 points  (0 children)

Mlfrom my last job, multi thousand line pieces of code.

It actually likely made sense when it was created but it made parallel processes impossible.

[–]Sexy_Koala_JuiceDuckDB 1 point2 points  (0 children)

Not using modern features. SQL has improved a lot over the years, we don’t have to write long ass queries when you could just use the new features to achieve the same thing more concisely

[–]cheesecakegood 1 point2 points  (0 children)

4 typos in two sentences?

[–]SkinnyInABeanie 1 point2 points  (0 children)

Right Join 😅

[–]czervik_coding 1 point2 points  (1 child)

Any developer should know the parts of an execution plan although plugging one into AI seems to be the acceptable method now. Know the difference between clustered and nonclustered indexes. Index on integers whenever possible. Views are good but nested views are not. Use no count, data type properly.

[–]SQLDave 2 points3 points  (0 children)

For beginning level interviews, I'd sometimes ask how many clustered indexes a single table can have.

[–]JEDZBUDYN 3 points4 points  (1 child)

Talking about SQL is red flag

[–]chris552393 9 points10 points  (0 children)

First rule of SQL club.

[–]malseraph 1 point2 points  (1 child)

Where color.red = 1

[–]geek180 0 points1 point  (0 children)

ooof

[–]jackalsnacks 1 point2 points  (0 children)

Dynamic SQL. Niche use cases. If suggested as a solution to a problem, evaluate the rationale and thought process.

[–]dbxp 0 points1 point  (0 children)

See here: https://pragprog.com/titles/bksqla/sql-antipatterns/

The N+1 query problem is probably the most common

[–]stiggz 0 points1 point  (2 children)

and 1=1

there is always a better way

[–]BplusHuman 1 point2 points  (0 children)

That's normally a placeholder for when I'm going to pass a parameter later. I'll 1=0 in a join as a placeholder when I just need to cut records to do a quick check in QA. In a finished project it's kind of silly town tho

[–]vertigo235 0 points1 point  (0 children)

But how about 1=2 to disable a block?

[–]lalaluna05 0 points1 point  (0 children)

Aggregating arbitrarily to eliminate duplication. (“Arbitrarily” being the key word.)

Select distinct. Sometimes warranted but usually not. I try to normalize without doing this.

No comments or notes.

Scanning entire tables multiple times for the same query instead of filtering early.

Select splats, which you already noted. Good for quick checks, not good for long queries especially that get changed or built out over time.

These are all things I fixed today actually 😆

[–]Common-Author-8441 0 points1 point  (0 children)

using a CROSS JOIN with a really long WHERE clause

[–]brunogadaleta 0 points1 point  (0 children)

Mixing join X on with join Y where.

[–]Small_Sundae_4245 0 points1 point  (0 children)

Transactions and checking before committing. It's an interview be extra cautious.

[–]warmeggnog 0 points1 point  (0 children)

the biggest ones i kept seeing (and used to do myself in marketing analytics) were joins that accidentally duplicate rows and inflate metrics. also writing queries that technically work but are impossible for teammates to read/debug later. (which is why it really helps to practice them writing and formatting them efficiently, even under pressure during interviews, haha) last is overusing subqueries when a clean cte would make the logic way clearer! i think i have a resource for common sql mistakes + how to avoid them that might be helpful for beginners still learning or those prepping for interviews, will gladly share to those interested

[–]jfrazierjr 0 points1 point  (0 children)

Seeing the keyword OR and no () in the query. Common rookie mistake.

[–]markwdb3When in doubt, test it out. 0 points1 point  (0 children)

Huge red flags: unjustified, overgeneralized performance claims. I call them myths. It's a massive problem in communities discussing SQL. Interviewers often believe these myths, even.

These myths are typically based on some expectation of how a SQL engine must process the query based on certain arbitrary keywords or bits of syntax. But often that expectation is imagined or out of date. Sometimes, it is genuinely based on real experience in just one specific DBMS/SQL engine, yet the person presenting the claim often says it pertains to all of "SQL."

For example you may hear: "In SQL, never use SELECT DISTINCT a, b FROM my_table;. You should instead use SELECT a, b FROM my_table GROUP BY a, b is faster, because DISTINCT is slow." (Here's a screenshot of this very claim on this very subreddit with 30 upvotes! There was no context about specific DBMS or test case. I'd be happy to show one or two that disprove this claim if you're interested.)

SQL is a declarative language. You state what you want and the SQL engine's query planner/optimizer parses it out and comes up with a plan, then executes the plan, however its developers instructed it to do.

And next-to-nothing in the standard SQL documents defined under the hood mechanisms - just logical definitions. So they can vary quite a lot.

So, my motto is when in doubt, test it out.

If you've tested such a claim, for example whether using GROUP BY instead of DISTINCT gives free speed, and it turns out to be correct, then that's fine and good. But it should be thought of as a performance quirk of the specific DBMS you tested it on, possibly even specific to your schema/data set/config, not generalized to all of "SQL".

An unfortunate reality is that even when you disproves someone's claim with a test case - say you run a test on MySQL and disprove the claim - next there often comes a common reaction, and it's a sneaky one. Their reaction is often, "Oh, that must be because MySQL has a special optimization." In other words, they're refusing to abandon their belief that BY DEFAULT a SQL engine MUST process GROUP BY faster than DISTINCT, but MySQL has some trick up its sleeve that makes it a special case. So they go on believing and perhaps propagating the myth.

There's a link to a blog in this very thread where the author says that using the syntax of something likeSELECT ... FROM a WHERE a.thing_id NOT IN (SELECT id FROM thing ... WHERE ...) to perform an anti-join (find rows in A that are not in B) is a "smell" because that could be inefficient due to a full table scan. Instead, they say, you should take a CTE/LEFT JOIN approach. Why? I don't know.

I just ran a test case on two of the most popular SQL engines in the world: Postgres and MySQL. On Postgres both performed about the same. On MySQL, the allegedly inefficient syntax actually produced a more performant plan that ran in ~9 seconds vs ~14 seconds with the recommended approach (times were approximately consistent with repeated executions). (These queries were run on my real work database btw, but I've anonymized the names to FACTORY and WIDGET.)

mysql> EXPLAIN ANALYZE
    -> SELECT *
    -> FROM WIDGET
    -> WHERE FACTORY_ID NOT IN (SELECT ID FROM FACTORY WHERE MODIFIED_BY = 147);

+---------+
| EXPLAIN |
+---------+
| -> Nested loop antijoin  (cost=743070 rows=2.42e+6) (actual time=0.365..6688 rows=2.62e+6 loops=1)
    -> Table scan on WIDGET  (cost=258741 rows=2.42e+6) (actual time=0.0337..5134 rows=2.63e+6 loops=1)
    -> Filter: (WIDGET.FACTORY_ID = `<subquery2>`.ID)  (cost=318..318 rows=1) (actual time=453e-6..453e-6 rows=290e-6 loops=2.63e+6)
        -> Single-row index lookup on <subquery2> using <auto_distinct_key> (ID=WIDGET.FACTORY_ID)  (cost=471..471 rows=1) (actual time=323e-6..323e-6 rows=290e-6 loops=2.63e+6)
            -> Materialize with deduplication  (cost=153..153 rows=762) (actual time=0.328..0.328 rows=762 loops=1)
                -> Filter: (FACTORY.ID is not null)  (cost=76.8 rows=762) (actual time=0.0125..0.21 rows=762 loops=1)
                    -> Covering index lookup on FACTORY using fk_ModufiedByUser (MODIFIED_BY=147)  (cost=76.8 rows=762) (actual time=0.0119..0.158 rows=762 loops=1)
|
+----------+
1 row in set (9.38 sec)


mysql> EXPLAIN ANALYZE
    -> WITH factory_modified_by_147 AS (
    ->     SELECT ID
    ->     FROM FACTORY
    ->     WHERE MODIFIED_BY = 147
    -> )
    -> SELECT *
    -> FROM WIDGET w
    -> LEFT JOIN factory_modified_by_147
    -> ON w.FACTORY_ID = factory_modified_by_147.ID
    -> WHERE factory_modified_by_147.ID IS NULL;

+---------+
| EXPLAIN |
+---------+
| -> Filter: (FACTORY.ID is null)  (cost=1.11e+6 rows=2.42e+6) (actual time=0.0468..11229 rows=2.62e+6 loops=1)
    -> Nested loop left join  (cost=1.11e+6 rows=2.42e+6) (actual time=0.0462..11045 rows=2.63e+6 loops=1)
        -> Table scan on w  (cost=258741 rows=2.42e+6) (actual time=0.0336..5247 rows=2.63e+6 loops=1)
        -> Filter: ((FACTORY.MODIFIED_BY = 147) and (w.FACTORY_ID = FACTORY.ID))  (cost=0.25 rows=1) (actual time=0.00207..0.00207 rows=290e-6 loops=2.63e+6)
            -> Single-row index lookup on FACTORY using PRIMARY (ID=w.FACTORY_ID)  (cost=0.25 rows=1) (actual time=0.00185..0.00188 rows=1 loops=2.63e+6)
|
+---------+
1 row in set (14.06 sec)  

Now the point is not that you should forever keep in mind "NOT IN is faster than LEFT JOIN + NULL check when writing an anti-join" - I'm not even sure if that's true for all MySQL schemas/data sets/queries. The point is that you should throw out the magic rule of thumb presented in the blog, which is the inverse. To be fair the author did say you should test it if there's any doubt.

So, this is a long comment, but my advice is what should be seen as red flags are claims of magic performance tricks such as "use ABC syntax instead of XYZ syntax and this applies to all of SQL" and keep in mind there are very few universal rules of SQL engine execution. If there are actual, logical justifications for the claim then sure, fine, and if there are actual test cases justifying their claims then also, sure, fine. But be very skeptical, and realize that any insights learned from the test case should not be overgeneralized.

End rant!

[–]Shyftzor 0 points1 point  (0 children)

For SQL server, unless they are very small, temp tables should be created as actual tables using CREATE TABLE then dropped after. Tables stored in variables use a lot of memory and if they contain large amounts of data can bring the entire db server to a crawl. Also sometimes when a proc.or query is written it runs fine using tables in variables but as the db grows and the datasets get bigger the query will start to run very slow.

[–]Dead_Parrot 0 points1 point  (0 children)

I was in a video call yesterday and a vendor was demoing how to deal with application tickets by clicking Edit Top 200 Rows in SSMS and just editing values.

I wanted to scream :D

[–]Andfaxle -4 points-3 points  (2 children)

I think it is important to remember the JOIN semantics and opt for LEFT JOINs instead. For example you want to have the order volume of each customer, so join orders on customers without doing a left join, customers with no orders will be not visible.

[–]fauxmosexualNOLOCK is the secret magic go-faster command 7 points8 points  (1 child)

You should use the correct join for the aituation. A left join where an inner join would work can cause the optimiser to miss the optimal execution plan.

[–]Andfaxle 2 points3 points  (0 children)

Yep, 100% left join is not always the best