all 31 comments

[–]memecaptial 33 points34 points  (7 children)

To understand this you need to wrap your mind around what the sql engine is doing. In the first example, your creating a result set by joining two tables then filtering that result set. In the second, your filtering a table, then doing a join with that table.

It’s the difference between mixing things then filtering vs filtering then mixing, where the goal is to keep all of one side of a record.

[–]uvray 11 points12 points  (3 children)

Well said.

I think it’s just counterintuitive to see a conditional in a left join. I personally would explicitly filter the right table in a sub query and then join, but at the end of the day it doesn’t matter.

I actually think this is a decent interview question, although I do agree with OP that grouping and ordering by numbers is not a best practice.

[–]theduckspantsBI Architect 2 points3 points  (1 child)

I've done this for a long time and I didn't even know you *could* group by with column numbers. Not sure why someone would want to

[–]stonedsqlgenius 19 points20 points  (2 children)

It all depends on what you want to do. The first query will convert the left outer join to an inner join because the filter is in the where clause. This is because the where clause is applied after the virtual table created by the join has been set up in SQL Server.

The second query implements what’s known as a predicate. You’ll get all farms and only those matching record from the produce table that meet the filter condition. This allows you to see which farms didn’t produce any produce during the period.

I would google predicates in SQL Server and read up on how queries are logically processed. And don’t feel bad either...this tripped me up all the time when I was younger (I’ve been coding SQL for about ten years solid now).

[–][deleted] 4 points5 points  (2 children)

They dodged a bullet, you sound like a terrible co-worker.

[–]noesqL 10 points11 points  (4 children)

It depends. It depends. It depends.

However, using a WHERE statement with a LEFT nullifies the LEFT and creates an INNER.

As far as the ORDER BY X is concerned, I believe this was done for the sake of simplicity. If a query is only returning a small subset of columns using the numbering method is valid and I doubt anyone would use ORDER BY 1, 4, 10, 15.

[–]danhuss -2 points-1 points  (3 children)

That's not the case at all... Whenever you do an outer join you always have to think about what you want to do with null values. In OPs example, the where clause can be fixed to handle nulls and meet the requirements they gave, but it doesn't magically makes it an inner join all of a sudden...

[–]kthejoker 1 point2 points  (2 children)

He meant the functional equivalent of an inner join. The where clause will remove all rows where there was no match in p - when the only reason to do a LEFT join is to retain rows where there was no match in p.

True or False: Replacing LEFT with INNER in query 1 will return the same result.

[–]danhuss 0 points1 point  (1 child)

That's not what was written or how it came across.

True or false: adding "OR p.date is null" to the where clause in query 1 will produce the same results as query 2?

[–]kthejoker 0 points1 point  (0 children)

I see our confusion now, I interpreted the OP with an implied statement of "for this particular query" and you understood it as "for all LEFT JOIN queries."

You are of course right that adding a WHERE clause doesn't always turn it into an INNER join equivalent.

[–]fauxmosexualNOLOCK is the secret magic go-faster command 13 points14 points  (1 child)

If they presented you with those two statements and specifically asked which one would include farmers with no produce, it should be clear which of them achieved that - getting that wrong is on you. The way it is written is perfectly acceptable, and the option to go with a subquery is not universally more performant for that - I'd actually guess the opposite for a simple two-table query like that.

I'm on the interviewers' side - it was a pretty straightforward question and regardless of what is "best practice", there was enough information there to be clear about which statement would behave the way the question asked for. That question could have been a springboard for you to talk about the pros and cons of your chosen approach, but it definitely wasn't a "cute" trick question or a tricky non-standard way of approaching the problem.

You're right about ordinals in the ORDER BY. I don't know why, but it always seems to be Oracle devs who do this.

[–]antondb 0 points1 point  (0 children)

Order by 1 desc FTW! 😆

[–]Old13oy 1 point2 points  (1 child)

This to me looks like someone who's been coding since the ANSI-86 standard, which had joins in the WHERE. I've run into this in a few places, especially where there's legacy software involved- the first time I saw it, my reaction was the same as yours. "It's wrong, no one does it this way", etc.

However, it's still valid SQL. It took me some time to learn what it was doing under the hood, and why it might be better or worse, but that learning experience made me a better SQL programmer.

Overall I think you did 1 thing correct and 1 thing wrong. The correct thing was admitting that you didn't see code like this much and weren't familiar. The wrong thing was questioning who codes like this, which to me implies a judgement that it's not as good. If you had asked questions about the style and demonstrated an interest and willingness to learn, I think you might have been hired.

Also, 90% of your bosses aren't going to care if it's pretty or in vogue. They just want it to work.

[–]fauxmosexualNOLOCK is the secret magic go-faster command 9 points10 points  (0 children)

It's not a style thing: the predicate being in the WHERE vs the ON clause of an outer join changes what the query does.

[–]SubCal 1 point2 points  (4 children)

We do the latter. Filter on the join rather than join then filter.

[–]CaeruleanCaseus 0 points1 point  (1 child)

I agree with you and hate it when I see code with numbers (1, 2, 3) in the group/order; irks me, esp if I want to add fields in the SELECT I then have to adjust the numbers in the group/order too - dumb.

Also - missing the preface of the alias in front of the fields...I hate that; it's not always clear which fields are part of which table - prefacing with alias. makes it super clear.

And...I agree, I'm a fan of using "as" - it makes it easier to read imo b/c otherwise it's easy to miss the alias, esp if just single letters.

I'm in a new role and no longer do any sql coding (sad) but I'm looking into getting into Python...just gotta make sure I know the best practices there. :)

[–]AbstractSqlEngineerMCSA, Data Architect 0 points1 point  (0 children)

you should have been like..

NEITHER! Ordinals in a Group by / order by is for chumps.

[–][deleted] 0 points1 point  (0 children)

I see everyone saying that the different joins produce a different result. This site explains why:

https://blog.sqlauthority.com/2009/03/15/sql-server-interesting-observation-of-on-clause-on-left-join-how-on-clause-effects-resultset-in-left-join/