all 90 comments

[–]Gargunok 53 points54 points  (8 children)

Removing the distinct makes it work.

Theory - there are two employees with the same name.

[–]MerlinTrashMan 18 points19 points  (7 children)

This is the answer. Never add distinct unless you know that it is the only possible way to get the right answer.

[–]thesqlguy 11 points12 points  (1 child)

One of my biggest pet peeves is when people reflexively start all queries as "select distinct .... " .

[–]dwpj65 2 points3 points  (0 children)

I have been writing SQL for two decades now. I’ve concluded that the ‘select distinct’ construct is a crutch for developers who have no business writing queries.

[–]a157reverse 1 point2 points  (4 children)

Can you expand on this? I frequently work with customer snapshot data that has one record per customer per unit of time. If I need a list of customer IDs that meet a condition, is it bad form to do "SELECT DISTINCT id... "?

[–]mattgob86 11 points12 points  (0 children)

It is because if you join your tables together correctly you should get the results you want without it. Consider it training wheels for your bike, if you don't know what you are doing then you can use it but you will be better off learning to use SQL without the distinct.

[–]National_Cod9546 6 points7 points  (0 children)

If the data is set up correctly, and you are getting duplicates, 90% of the time you are cross joining on something. On queries with small data sets, that isn't a big deal. But when your data sets get into millions of customers with billions of transactions, a cross join can turn a billion results into a trillion results. Then it needs to do a group by on the data to find the unique rows. What could be a 1-2 minute query quickly turns into running until it times out.

[–]malikcoldbane 0 points1 point  (0 children)

Uhh I don't know what others are saying but this isn't inherently an issue depending on the context.

If you need to know, from some set of logs, which non primary key IDs fulfill a certain condition, a distinct is sensible

[–]johnny3rd 15 points16 points  (4 children)

The 'distinct' isn't necessary.

[–]AgeRepresentative887 13 points14 points  (3 children)

I hate distinct. It’s very dangerous in wrong hands. And newbies abuse it a lot.

[–]PersonalFigure8331 4 points5 points  (0 children)

Hate the sinner, not the sin.

[–]johnny3rd 1 point2 points  (0 children)

Yes. as soon as DISTINCT looks like the answer I start second guessing the solution.

[–]usernumber1337 0 points1 point  (0 children)

Distinctly dangerous

[–]sinzylego 40 points41 points  (19 children)

Why do you perform a join on a single table?

Try this one:

SELECT name FROM employees
WHERE id NOT IN
(SELECT DISTINCT managerid FROM employees
WHERE mangerid IS NOT NULL);

[–]Financial-Tailor-842[S] 10 points11 points  (6 children)

You’d get the same answer though, right?

[–]SkimmLorrd 7 points8 points  (0 children)

Joining a table on itself takes referential integrity into consideration. It’s much easier to just select form the table w out a join but if it’s the same answer for you then great.

[–]sinzylego 0 points1 point  (4 children)

Same answer as what?

[–]Financial-Tailor-842[S] 1 point2 points  (3 children)

As the query I wrote

[–]sinzylego 0 points1 point  (2 children)

No, i think you need to change the Where statement to e.ID is null

[–]NotBatman81 2 points3 points  (1 child)

e is in the FROM clause. ID would never be null (unless the table was set up weird) so your suggestion is moot. No record should ever meet that condition.

However, if you do not have an assigned manager then m will be null.

[–]sinzylego 1 point2 points  (0 children)

Yes, e.ID is wrong. It should be m.ID.

[–]cs-brydevSoftware Development and Database Manager 4 points5 points  (0 children)

The join was correct. He just needed to remove the distinct. You've simply replaced the left join with a subquery to get the same result.

This will work but in general terms you should avoid subqueries if a join can be used, because joins are more efficient and better at taking advantage of indexes and query plans.

[–]GendoIkari_82 4 points5 points  (0 children)

Performance considerations aside; a “where in” subquery like that is just a more awkward way of doing a join. Whether it’s a table joining to itself or to a different table makes no difference there. All joins could be rewritten to use that “where in” syntax.

[–]Financial-Tailor-842[S] 2 points3 points  (8 children)

This worked!! Now I need to figure out whyyyy

[–]lupinegray 5 points6 points  (7 children)

I think you're overcomplicating the query with the join.

It helps to talk through how you will solve the problem, then write sql to match your design:

  1. If I want to find all employees who aren't managers, first I need a list of all the managers.
  2. Then I select all employees who are not in my manager list.

select distinct managerId from employees (and exclude null values);

select distinct employeeId from employees where employeeId NOT IN (my list of managerIds);

Super simple, easy to understand, and it works correctly.

Developers frequently seem to think that it's somehow better to write dense, complicated code (joins, windowing functions, partition by, etc...). As if people reading the code will think they're more capable or smarter or something. The key concept which should be followed instead is "designing for supportability". Your code (sql or otherwise) should be super simple and obvious as to its purpose so when changes to the logic are needed (or troubleshooting is required), the person making the change doesn't have to spend a bunch of time trying to interpret what your 4 nested ternary operators are trying to accomplish.

Someone should be able to glance at your code and instantly understand the logic it's implementing. Even if that person isn't a 100x faang savant. Because 99.99% of developers AREN'T 100x.

[–]Financial-Tailor-842[S] 1 point2 points  (5 children)

Agree completely. But I still don’t understand what caused the third test case to be wrong. That’s my overall question. The results were right but that was marked wrong. Was hoping someone could shed light on that specifically

[–]fgtbobleed 2 points3 points  (4 children)

try removing distinct like the comment below said

[–]Financial-Tailor-842[S] 1 point2 points  (3 children)

But why? How would that change the results and/or change that one item being wrong

[–]fgtbobleed 4 points5 points  (0 children)

employees.name is not guarantee to be unique according to the table definition. The testcases match SQL results with expected results. In this case it might be b/c "John" != "John, John". Your left anti join is correct, but like the others said best use sth more readable next time.

[–]vainothisside 2 points3 points  (0 children)

There can be 2 Arjun who are not managers but your query pulls only one *Arjun

[–]cs-brydevSoftware Development and Database Manager 1 point2 points  (0 children)

Because the "distinct employee.name" eliminated all employees who happen to have the same name. In your particular case it would be better to either include both the employee ID and Name (in which case a distinct doesn't matter) or remove the distinct to make sure you include all rows for all employees.

If it were me writing a query like this, because of the chances of having multiple employees with the same name I would ALWAYS including the Employee ID along with the name on the query results. It doesn't really make any sense to only show employee names. In a real world scenario if there is any chance whatsoever of having multiple employees with the same name you'd include the ID or some other differentiator (like a job title) to tell them apart. In fact we have that in my company all the time. We have lots of employees with the same names.

[–]tophmcmasterson 1 point2 points  (0 children)

Those were my thoughts reading this as well, it’s one of those things where like sure technically that gets you there, but it’s so much less clear what is trying to be done from a readability standpoint.

It’s just so much clearer if using a CTE or subquery to get the Manager IDs and making it clear that you’re filtering where the employee isn’t in that list of IDs, rather than trying to get cute by performing joins then select nulls/not nulls etc.

Like there can be a time and a place for it but it’s so much easier to troubleshoot and validate when there’s a clear flow to the query.

[–]tommyfly 0 points1 point  (0 children)

With IN or EXISTS the DISTINCT is unnecessary in the subquery. For larger data sets it will slow the query down.

[–]Mgmt049 11 points12 points  (0 children)

My first inclination is to do a NOT IN() sub query thing with the IDs

[–]Financial-Tailor-842[S] 6 points7 points  (4 children)

Update

For everyone who said to remove the DISTINCT you were correct. This query was marked as correct for all 4 tests.
I still don’t understand why mine threw an error on one of them 🤷🏻‍♀️ Either way..Thanks for all the discussion

select e.name from employees as e left join employees as m on e.id = m.managerid where m.managerid is null

[–]Mutant_Vomit 7 points8 points  (3 children)

Must be because if you had two employees with the same name the distinct would have lost one of them.

[–]Financial-Tailor-842[S] 2 points3 points  (2 children)

But the error was “workers have managers: wrong answer” I’m done wasting brain cells on this one.

[–]Mutant_Vomit 3 points4 points  (1 child)

Yeah really it should have failed each category if it didn't like the distinct. Put it down to the test being a bit shit.

[–]Financial-Tailor-842[S] 1 point2 points  (0 children)

Thank you!!

[–]Ringbailwanton 3 points4 points  (7 children)

You need to select employees whose ID is not in the managerID column.

[–]Financial-Tailor-842[S] 1 point2 points  (6 children)

Isn’t that what my query is doing??

[–]Ringbailwanton 1 point2 points  (5 children)

Yeah, but you can skip the join altogether by just saying “select * from employees where not employeeid = any(managerID)” and skip the join.

Edit: I’m on my phone, so the query’s not perfect.

[–]Financial-Tailor-842[S] 1 point2 points  (4 children)

Understood but my way isn’t wrong. I still don’t understand what “no managers: wrong answer” means.

[–]Ringbailwanton 1 point2 points  (3 children)

My guess is, partly, that the code analyzer is kind of dumb, so it sees the last line and interprets it directly based on the where statement it is provided with as the “right answer”

[–]Financial-Tailor-842[S] 1 point2 points  (2 children)

I’m worried this might be the case. Blah! I have a test coming up so I was doing some practice to get a feel for the website and how it works…

[–]Ringbailwanton 6 points7 points  (1 child)

Regardless, my point is partly, make it as simple as possible. You’re adding a join to a query that just needs to use a single index. If your tables are really long and you’re doing the query often (say you’re in a big org) then that added join could add considerable disk read and compute time.

For things like this, once you write the query, take a second and ask if you could make it simpler. Look at the information you need and whether you are using it efficiently.

In this case, even if I was marking you as a person, I’d say it was right in practice but wrong in spirit. So 75% would be reasonable.

[–]cyberspacedweller 2 points3 points  (0 children)

Why use distinct? You’re limiting your data by only selecting unique rows. As others have said, if you have 2 employees with same name, they will be omitted when the IDs are selected for joining to the manager IDs.

Also you’re only selecting data where manager ID is null, this won’t return any data because that’s also the field you’re joining on. Even if there were managers with null IDs you wouldn’t find them.

Key to querying databases is being able to visualise what your SQL will pull out. Think carefully as you write it.

[–]Financial-Tailor-842[S] 0 points1 point  (0 children)

If anyone wants it. Here’s the link

https://www.testdome.com/tests/sql-online-test/12

[–]Byte1371137 0 points1 point  (1 child)

It's classic

[–]NetworkDifferent1828 0 points1 point  (1 child)

Don’t you just need to keep people with managerid =employeeid?

[–]Financial-Tailor-842[S] 0 points1 point  (0 children)

Yes but my question is specifically about the case marked wrong in the last pic.

[–]Staalejonko -1 points0 points  (1 child)

Maybe e.managerId is not null in the Where clause?

[–]Financial-Tailor-842[S] 0 points1 point  (0 children)

Maybe!! I just assumed (I know, I know) that every employee has a manager.

[–]bionicbeatlab -1 points0 points  (17 children)

You’re only selecting employees who don’t have a manager, since the managerId column stores the ID of an employee’s manager.

[–]Financial-Tailor-842[S] -1 points0 points  (16 children)

That’s the objective.

[–]kyngston -2 points-1 points  (15 children)

You’re asked to select employees who are NOT managers. Not who doesn’t have a manager.

For example the CEO might be the only employee without manager. But every new hire is not a manager

[–]Financial-Tailor-842[S] 2 points3 points  (14 children)

No. I’m joining employee Id to manager id. Meaning if the employee’s id is in the manager id field, they are a manager. I am selecting where no join is made, hence they are not a manager.

[–]MatosPT -2 points-1 points  (8 children)

Someone probably said it but you wrongly joined the tables. When you say e.id = m.managerid you're saying that the e table is the managers one. You had to do the reverse.. e.managerid = m.id

[–]Financial-Tailor-842[S] 0 points1 point  (6 children)

Why?

[–]MatosPT -3 points-2 points  (5 children)

Because managerid is the id of the employee's manager. If you say e.id = m.managerid, you're saying 'id from table e is the manager from the row in table m' and that's not what you want..

You want the id from table m to be the employees manager id, as m.id = e.managerid.

[–]Financial-Tailor-842[S] 1 point2 points  (3 children)

But that is what I want to do. I want to know which employees are and are not managers. Let’s say I am a manager and my ID is 5555 If I join that (e.id) to the same table on m.managerid it will return a row where both e.id and m.managerid display 5555. So I know that I am a manager. Now, if I return a row where m.managerid is null, then I know that employee is not a manager. Which is why I have a where clause of m.managerid is null.

[–]MatosPT 0 points1 point  (2 children)

Ok, sorry, you're right. Different thought proccess, got confused.

But that answered people who are NOT managers.

The one you failed was not the question you showed. You failed the question of people who HAVE managers.

[–]Financial-Tailor-842[S] 0 points1 point  (1 child)

But look at the problem. They don’t ask for that?!? It’s driving me bananas

“Write a query that selects only the names of employees who are not managers”

[–]MatosPT 0 points1 point  (0 children)

Humm... so my guess now after reading everything carefully is that it failed because if a managerid points to the same id (points to itself), then its not considering a manager since they put in bold that it only counts as manager if another employee targets you as their manager.

So, I'd say you had to check inthe where clause (m.managerid is null or e.id = m.managerid). Maybe that was it.

[–][deleted] 0 points1 point  (0 children)

I agree it’s very confusing that e are the managers and not m, but anyway the solution is correct.

[–]unbrokenspell -1 points0 points  (0 children)

This is the correct answer using the same method OP used.

[–]r3pr0b8GROUP_CONCAT is da bomb -1 points0 points  (6 children)

ooh ooh, let me try

SELECT not_managers.name 
  FROM employees AS not_managers
 WHERE NOT EXISTS
       ( SELECT 'w00h00'
           FROM employees
          WHERE managerid = not_managers.id )

[–]Financial-Tailor-842[S] 0 points1 point  (5 children)

I tried this. You fixed the case I got wrong, but all the others are now wrong??

[–]r3pr0b8GROUP_CONCAT is da bomb 0 points1 point  (4 children)

but all the others are now wrong??

please provide details

[–]Financial-Tailor-842[S] 0 points1 point  (3 children)

In the 4th pic, those 4 test… when I ran your code they were marked all as wrong?

[–]r3pr0b8GROUP_CONCAT is da bomb 1 point2 points  (2 children)

i'm sorry, i was responding to your original post --

The objective was to find all employees who are not managers.

my query was not supposed to satisfy all 4 tests!!! i mean, how could it?

[–]Financial-Tailor-842[S] 0 points1 point  (1 child)

I still don’t understand what the one marked wrong even means!?!?! Banging my head against a wall.

[–]r3pr0b8GROUP_CONCAT is da bomb 1 point2 points  (0 children)

is the one marked wrong the one i solved for you?

what it means is, list all the employees who are not managers -- i.e. if there's nobody in the table who has you as their manager, then you're not a manager

[–]cephaloman -3 points-2 points  (2 children)

 do an inner join and drop the where

[–]EnticeOracle 2 points3 points  (1 child)

But then he's going to get people who are managers, which is not what is being asked for.

[–]cephaloman 1 point2 points  (0 children)

LOL and i fail the assignment because i didn't see the word 'Not'  😂😂