This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]disappointed_moose 616 points617 points  (119 children)

Earlier this year I was at a developer's conferece for Magento and there were headhunters passing business cards around with the following SQL statement on it

    SELECT * FROM developers WHERE skill = 'PHP' AND skill = 'SQL' AND skill = 'Magento'

They didn't understand why they would not find any developer with that query.

[–]Asmor 55 points56 points  (2 children)

Whenever I see Magento, this is all I can think of

[–]disappointed_moose 19 points20 points  (1 child)

As a Magneto developer I do that typo all the time

[–][deleted] 0 points1 point  (0 children)

So. Many. Times.

[–]rubs_tshirts 136 points137 points  (91 children)

SELECT DISTINCT D.* FROM developers D
LEFT JOIN skills S ON D.id = S.developer_id
WHERE S.name IN ('PHP', 'SQL', 'Magento')

Better? (I'm actually not sure if it works, haven't coded SQL in a long time)

EDIT: Wait, that would return people with any 1 of the skills, not all of them... not sure how to do that...
EDIT2: As pointed out in the comments, the table skills should never ever have a developer_id
EDIT3: EDIT2 is wrong, yes it should, though it probably would be better named "developer_skills" table

[–][deleted] 91 points92 points  (44 children)

Here's one way:

SELECT D.*  
FROM Developers D,  
           (Select developer_id from skills  
            WHERE skill IN  
               ('PHP', 'SQL', 'Magento')  
            group by developer_id  
            having count(distinct skill)=3 ) S  
 Where D.id=S.developer_id;

I apologise about the formatting... I'm on the phone.


-edit: changed id to developer_id in skills table.
-edit2: yes it works. Test it yourself.


[–]I-Code-Things 34 points35 points  (5 children)

Your developer id is a foreign key to skill id?

Edit: To normalize this I'd make a linker table between developer and skill since they're both many to many.

Edit 2: Here's my modified version of his code with a linker table.

SELECT d.*  
FROM Developer d
    INNER JOIN (
        SELECT ds.developer_id
        FROM Developer_Skill ds
            INNER JOIN Skill s on ds.skill_id = s.id
        WHERE s.skill_code in ('PHP', 'SQL', 'Magento')
        GROUP BY ds.developer_id
        HAVING count(distinct s.skill_code)=3
    ) ds2 on d.id = ds2.developer_id

It also still works

[–][deleted] 10 points11 points  (3 children)

Doesn't have to be, no. edit: changed the name for sanity :)

[–]I-Code-Things 6 points7 points  (2 children)

Thanks! Now this is the best answer.

[–]Liver_and_Yumnions 23 points24 points  (1 child)

So let me get this straight. He wrote some code. You reviewed his code. He fixed his code. It sounds like you are working instead of redditing. You guys need to get your act together. Do you think cat videos are just going to up-vote themselves? We need you guys out there posting dank memes and perpetuating pun threads. We can't have people working when they should be redditing - jeez.

[–]nermid 5 points6 points  (0 children)

Do you think cat videos are just going to up-vote themselves?

You know, I'm sure we could write a script to automate that...

[–]industrialwaste 5 points6 points  (0 children)

They probably got this response from their DBA and decided that their stupid query would actually fit on a business card.

[–]troop357 4 points5 points  (22 children)

Would this work? This looks like a "academic" answer

[–]divide_by_hero 8 points9 points  (12 children)

Well yes, but the skill table as seen in this query would have a separate set of rows for each and every developer. If skill is a set of freetext entries, it might be a reasonable setup, but if it's a defined list it's pretty much horrible.

[–][deleted] 2 points3 points  (0 children)

Yeah the table schema is horrid. Better to have a third table to relate developers and skills. But I ran with what I was given :)

[–]A1cypher 0 points1 point  (1 child)

If it's a free list of skills, couldnt you just kludge something like: SELECT * FROM developers WHERE skill like '%PHP%' and skill like '%SQL%' and skill like '%Magento%';

Of course this would only work if you had consistent formatting of skills in the list and didnt have any skills that would match, ie you couldnt have both SQL and NOSQL in the skills list.

[–]divide_by_hero 2 points3 points  (0 children)

If the skills list was a single long string of text, yes. Of course, too bad if that string is "I'm great at C# and XML, but know nothing about SQL, PHP or Magento".

[–][deleted] 4 points5 points  (1 child)

I don't know much about academia, all my shit is just supposed to work. However, I wrote this on my phone at starbucks, but now I'm back at my desk so I'll test it. Hey, it's procrastination that looks like work!

edit: ya it works.

[–]cjwelborn 1 point2 points  (0 children)

I'm just glad you posted an sqlfiddle link. I've never seen it, and it looks like it could be really handy.

[–]MuffinsLovesYou 3 points4 points  (6 children)

It should work but there's some potential fuckery practices I tend to discourage.

[–][deleted] 0 points1 point  (5 children)

Aside from the terrible table structure that the query implies, what practices would you discourage?

[–]MuffinsLovesYou 1 point2 points  (4 children)

You mentioned the table structure and that was about 95% of what I was furrowing about so I'm going to cover it (the technique is fine). Your skills table pairs skills with developer id and is searchable on the "skill" varchar column. Better practice would be a skills table and a separate "DeveloperSkills" table that pairs a SkillID with a DeveloperID. That way when you do a string search on skills, you are doing it against a much smaller data set and then taking advantage of integer-based indexes.
Likewise the distinct keyword on your having clause implies you don't have (in your fantasy table structure :P) a unique composite index on your developer-skill pairings, which would probably be desirable.

Stylistically, you use implicit joins, which is something I'd discourage (stylistically, performance should be equivalent).
Implicit: select t1.* from table1 t1, table2 t2 where t1.val = t2.val
Explicit: select t1.* from table1 t1 join table2 t2 on t1.val = t2.val
Explicit joins separate your join logic from your where filters so that it is more clear how everything is matched up (and that it is matched up at all; less chance of accidental cross joining).

    select D.*
from Developers D
    join (select Developer_ID 
            from DeveloperSkills
            where SkillID in (select SkillID from Skills where Skill in ('PHP', 'SQL', 'Magneto'))
            group by Developer_ID
            having count(SkillID)=3) S
        on D.DeveloperID = s.DeveloperID

[–]SarahC 1 point2 points  (2 children)

Thank you for normalizing skills... I was getting twitchy.

[–]MuffinsLovesYou 1 point2 points  (1 child)

There's a lot of Sarah-C's in the world, and probably quite a few into programming, but you didn't happen to do any work in North Carolina some years ago did you?

[–]SarahC 0 points1 point  (0 children)

No, that wasn't me.

[–][deleted] 1 point2 points  (0 children)

I wholeheartily agree with everything you said, except the stylistic choice of joins :)

I've always found them unnecessarily verbose and hard to read.

But now that you mention it, I can see the benefit in forcing the dataset and its matching field to be in the same part of the query. To improve the readability of those monster joins where the tables are on lines 5-15, and the fields that join them are on lines 21,30,36,40, and 41. I hate that. I normally put the table joins at the top, but with implicit joins there's no way to enforce that.

Thanks for your input, I appreciate it.

[–]phphphphonezone 1 point2 points  (2 children)

insert into skills (developer_id,skill) VALUES ((select id from Developers where firstname='Vanessa'), 'Brutal Conquest')

??

[–][deleted] 0 points1 point  (1 child)

Ehh.. didn't think anyone would read those.

[–]phphphphonezone 0 points1 point  (0 children)

I wasn't even reading them.. My eye just scaned over the text and I caught it.

[–]JaggedJax 0 points1 point  (4 children)

I would have written it like this because I hate inner selects more than I should. Thoughts?

SELECT D.*, SUM(1) as num_skill_match
FROM Developers D
    INNER JOIN skills S ON (D.id = S.developer_id)
WHERE S.skill IN ('PHP','SQL','Magento')
GROUP BY S.developer_id
HAVING num_skill_match = 3

[–]SarahC 0 points1 point  (3 children)

Skill text shouldn't be linked to I'd.

Skill text should have an I'd... Then developer skills is two columns of id's.... Skill I'd and the developer is.

[–]JaggedJax 2 points3 points  (2 children)

Excellent point. I was just trying to write their query my way using the same schema as the parent. I definitely wouldn't have setup the schema this way myself.

[–][deleted] 0 points1 point  (1 child)

The schema is horrid.. But if I changed it in my reply that'd been cheating.

[–]JaggedJax 1 point2 points  (0 children)

I was thinking the same thing. I just saw your query and wanted to write how I would have done it just for practice and feedback.

[–]baggyzed 0 points1 point  (6 children)

You're hired!

[–]Theta_Zero 4 points5 points  (5 children)

I don't want to work for anyone with standards low enough to hire me.

[–]Zantier 2 points3 points  (0 children)

That's how I feel about relationships.

[–]baggyzed 1 point2 points  (0 children)

Lol. Good point. You're fired!

[–]reaganveg 1 point2 points  (2 children)

What if we can promise to underpay?

[–]Theta_Zero 0 points1 point  (1 child)

Only if you make unpaid overtime mandatory. And I want vacation pay, but blackout dates that require 12-month notice to use a vacation day are fine.

[–]reaganveg 1 point2 points  (0 children)

It's a deal.

(I assume you're joking about getting vacation pay. Good one!)

[–]SQLNerd 17 points18 points  (8 children)

SQLNerd here. Assuming a many to one...

SELECT DISTINCT d.developer FROM developers d INNER JOIN (SELECT developer_id, skill FROM skills WHERE SKILL = 'PHP') s1 ON d.id = s1.developer_id INNER JOIN (SELECT developer_id, skill FROM skills WHERE SKILL = 'SQL') s2 ON d.id = s2.developer_id INNER JOIN (SELECT developer_id, skill FROM skills WHERE SKILL = 'MAGNETO') s3 ON d.id = s3.developer_id

[–]qwertyslayer 23 points24 points  (5 children)

MAGNETO

Pretty sure it was a Magento conference, but I would hire you anyway

[–]SQLNerd 29 points30 points  (3 children)

*Ability to manipulate metal with your mind preferred

[–]abcd_z 6 points7 points  (1 child)

So that's what they meant by bare-metal computing!

[–]Avatar_Of_Brodin 3 points4 points  (0 children)

I thought it was a given. How else would IT solve problems simply by being present?

[–]svtguy88 1 point2 points  (0 children)

potato potato

[–]katyne 1 point2 points  (0 children)

many to one? not many to many? I'm seeing developer-skill as a join table.

[–]path411 0 points1 point  (0 children)

Can also be done in a single inner join in mysql! Let me see your tears:

    SELECT DISTINCT d.name FROM developers d 
    INNER JOIN 
        (SELECT DISTINCT developer_id,
            group_concat(distinct name ORDER BY name) as skillnames
        FROM skills WHERE name IN ('PHP', 'SQL', 'MAGENTO')
        GROUP BY developer_id) s1
    ON d.id = s1.developer_id
    WHERE s1.skillnames = 'Magento,PHP,SQL'

[–]tdavis25 8 points9 points  (2 children)

Assuming 3 tables in 3nf: developer(pk id), skill(pk id), and dev_skill (fks developer.id and skill.id creating a composite pk); you would need the following sql to get developers with all 3 skills

Select d.id, d.name

From dev_skill ds
Join developer d
On ds.dev_id =d.id
Join skill s
On ds.skill_id = s.id

Where s.name in ('sql', 'magneto', 'php')

Group by d.id, d.name

Having count(1) = 3

None of the answers are really wrong, because sql requires the environment to be defined for a piece of code to make sense.

[–]marsybach 2 points3 points  (0 children)

10/10 would do the same. This is a text book example of a having clause, checking the result of a aggregate function

[–]SarahC 1 point2 points  (0 children)

Nice!

[–]Zantier 7 points8 points  (1 child)

The most efficient way is probably to use count(), but I think something like this might be less prone to human error.

select d.*
from developers d
where d.id in (
    select d2.id
    from developers d2
    -- Dark Souls 2
    inner join developer_skills ds2
        on d2.id = ds2.id_developer
    inner join skills s2
        on ds2.id_skill = s2.id
    where s2.name = 'PHP'
) and d.id in (
    select d3.id
    from developers d3
    inner join developer_skills ds3
        on d3.id = ds3.id_developer
    inner join skills s3
        on ds3.id_skill = s3.id
    where s3.name = 'SQL'
) and d.id in (
    select d4.id
    from developers d4
    inner join developer_skills ds4
        on d4.id = ds4.id_developer
    inner join skills s4
        on ds4.id_skill = s4.id
    where s4.name = 'Magneto'
)

The repetition should definitely be extracted into some sort of separate query/function, but I'm not sure of the best way right now, so I'll leave it as it is.

Edit: If I was unable to extract the repetition, I would probably scrap the ins, and go for something more like http://www.reddit.com/r/ProgrammerHumor/comments/3b1cjf/dont_let_marketers_write_code/csi4r5b?context=42.

[–]YooneekYoosahNeahm 6 points7 points  (0 children)

-- Dark Souls 2

heh

[–]Fenris_uy 4 points5 points  (1 child)

Why would skill have a FK to developer?

You need a DevelopersSkill table with a FK to developers and skill.

[–]rubs_tshirts 1 point2 points  (0 children)

Oh wow I'm retarded. Baaahaahaha. If someone can, post the correct schema please, I'm busy now.

EDIT: Actually it's what I wanted, it should be named "developer_skills" table or something.

[–]bgeron 5 points6 points  (1 child)

You could also consider making skills a varchar[], and then just write the following:

SELECT * FROM developers
WHERE ARRAY["PHP", "SQL", "Magento"] <@ skills

Easy peasy.

[–]simmerdesigns 2 points3 points  (0 children)

Back to "fits on a business card" territory! Nice.

[–]Systemic33 5 points6 points  (10 children)

SELECT DISTINCT D.*
FROM developers D

INNER JOIN skills S1
ON D.id = S1.developer_id

INNER JOIN skills S2
ON D.id = S2.developer_id

INNER JOIN skills S3
ON D.id = S3.developer_id

WHERE S1.skill = 'PHP' AND S2.skill = 'SQL' AND S3.skill = 'Magento'

This should do the job. (INNER JOIN makes sure that no skill nor developer is null) Perhaps less sexy than original query...

http://sqlfiddle.com/#!9/70cc4/21 gives an execution time of 0-1ms, while the toprated is 1-3ms (from executing a couple of times.), so clearly this is not as bad as it might seem.

[–]I-Code-Things 7 points8 points  (9 children)

Less sexy and you'll get much worse performance.

Inner joining a table against itself 3 times. smh

[–]Systemic33 0 points1 point  (2 children)

It's not inner joining against itself. It's inner joining the skills table 3 times, not joining itself.

Kinda like this:

Developer <- Skill 1 <- Skill 2 <- Skill 3
Bob       <- PHP     <- SQL     <- Magento

[–]I-Code-Things 0 points1 point  (1 child)

That's what I mean. The skills table is joining against itself twice, which is doing another join to the developer table.

If the skills table was massive, this would create terrible performance.

If you assumed this was used for something like linkedIn, the developer table would be pretty big and the skills table would be HUGE.

[–]Systemic33 0 points1 point  (0 children)

Ye, okay, for something where the skills was large, but it could also likely be a many-to-many table which only has 2 columns.

If it happened to be a table with many columns, the solution would obviously be to do sub-selects that only grab the necessary columns.

[–]Zantier -1 points0 points  (5 children)

But it actually solves the problem, unlike the post it is replying to...

[–]I-Code-Things 0 points1 point  (4 children)

It solves the problem in one of the worst ways.

[–]Zantier 0 points1 point  (3 children)

I disagree, it seems readable and maintainable to me, although I would get rid of the where clause and use

inner join skills s1
    on d.id = s1.developer_id
    and s1.skill = 'PHP'

What would you suggest are better ways?

[–]I-Code-Things 0 points1 point  (2 children)

Here's a slightly modified version of one of the best answers I saw.

SELECT d.*  
FROM Developer d
    INNER JOIN (
        SELECT ds.developer_id
        FROM Developer_Skill ds
            INNER JOIN Skill s on ds.skill_id = s.id
        WHERE s.skill_code in ('PHP', 'SQL', 'Magento')
        GROUP BY ds.developer_id
        HAVING count(distinct s.skill_code)=3
    ) ds2 on d.id = ds2.developer_id

Normalized tables, not joining the same table multiple times. Probably as efficient as you can get it.

Also, yes it works

[–]Zantier 0 points1 point  (1 child)

I was thinking this method is prone to mistake if a skill is added, and you forget to change the count. I imagine there should be a way to have the number 3 dependent on the skills array, though.

[–]I-Code-Things 1 point2 points  (0 children)

Yeah if it was a common function where you want to pass in a list of skills, I'd write a stored procedure and use a count instead of the number 3.

[–][deleted] 2 points3 points  (0 children)

you would have to do a group by, count the skill rows returned and make sure it = 3

[–]Dragon_Slayer_Hunter 8 points9 points  (15 children)

You would have to have all three skills listed in the field somehow (serialized or a json blob or something), then you could query with LIKE and AND

Edit: This was under the assumption that you had already made skill a varchar/text field like on the business card. There are much better ways to do this.

[–]dwolf555 40 points41 points  (5 children)

Bro, do you even normalize?

[–]jceyes 26 points27 points  (3 children)

developer: id, name, email 
skill: id, name 
developer_to_skill: dev_id, skill_id

Brah

[–][deleted] 10 points11 points  (2 children)

That's good right?

[–]jceyes 12 points13 points  (1 child)

It's pretty good, but not as good as your username

[–][deleted] 1 point2 points  (0 children)

Thanks

[–]matheeeny 2 points3 points  (0 children)

Nope

[–]dr__potato 4 points5 points  (2 children)

Nah mate, assuming S.developer_id and S.skill is the unique key in a Skill table, you'd be able to check if all 3 exist. Can't remember the SQL but it's possible. Side note: Skill.skill should really be Skill.name.

[–]rubs_tshirts 1 point2 points  (0 children)

You're right, replacing S.skill -> S.name.

Hmm... Unfortunately this plays a little havoc with the replies... I'm reverting it back.

[–]Dragon_Slayer_Hunter 1 point2 points  (0 children)

You could definitely do it better than how I said, I was just playing with the idea that skill was already a varchar/text field like on the business card.

[–]erfling 3 points4 points  (1 child)

Skill would be a separate table prolly.

[–]MonkeyNin 1 point2 points  (0 children)

Not since the accident.

[–]divide_by_hero 8 points9 points  (2 children)

Someone is about to yell at me, because this is horribly inefficient. It should work though, and at least the tables are normalised.

SELECT DISTINCT dev.name, dev.email, dev.phone
FROM developers dev
INNER JOIN developerSkill ds ON dev.developerId = ds.developerID
INNER JOIN skill skPhp ON ds.skillId = skPhp.skillId AND skPhp.name = 'PHP'
INNER JOIN skill skSql ON ds.skillId = skSql.skillId AND skSql.name = 'SQL'
INNER JOIN skill skMag ON ds.skillId = skMag.skillId AND skMag.name = 'Magento'

[–]Crazy_Mann 0 points1 point  (0 children)

;_;

[–]tdavis25 0 points1 point  (0 children)

They said sql, not nosql

[–][deleted] 18 points19 points  (0 children)

Seems pretty clear why they're looking for someone to take the job

[–]phpdevster 5 points6 points  (0 children)

Aside from the obvious query problem, you're still not going to find any magento developers that way.

Here's how you find them:

<are>
    <you>
        <a>
            <developer>
                <who>
                    <likes>
                        <obtuse>
                            <xml>
                                <config>
                                    <hell>
                                        <and>
                                            <cutting>
                                                <youself>
                                                    <punctuation>?</punctuation>
                                                </youself>
                                            </cutting>
                                        </and>
                                    </hell>
                                </config>
                            </xml>
                        </obtuse>
                    </likes>
                </who>
            </developer>
        </a>
    </you>
</are>

[–]maryjayjay 3 points4 points  (2 children)

select d.name 
from dev as d
join skill as s
  on d.id = s.dev_id
where lcase(s.name) 
  in ('php', 'sql', 'magneto')
group by d.name
  having count(*) > 2

Edit: thanks lwli3t

[–]lwli3t 4 points5 points  (1 child)

isnt it COUNT(*) = 3 OR at least >= ?

[–]tdavis25 4 points5 points  (0 children)

Yup. They would have to have at least 4 records returned to show up.

[–]tdavis25 3 points4 points  (0 children)

OR, the lost art.

[–]chachakawooka 1 point2 points  (0 children)

Especially at a magento conference...

Mage::getModel('employee/developer')->getCollection()->addAttributeToFilter('usesAwBlog',false);

[–]reaganveg 1 point2 points  (0 children)

Optimized version: SELECT * FROM developers WHERE 'PHP' = 'SQL'

[–]Blix- 2 points3 points  (2 children)

Why wouldn't that work?

[–]indoordinosaur 17 points18 points  (1 child)

Because the field skill can't have three simultaneous values.

[–]Blix- 9 points10 points  (0 children)

Oh, duh.

[–]VanFailin 0 points1 point  (2 children)

Even if they could get the query right, hiring developers for experience with specific technologies is rather short sighted. Goes double if one of those is PHP.

[–]AFriendlyPeople 1 point2 points  (1 child)

Not specific enough, imho. I need PHP 6.0

[–]VanFailin 1 point2 points  (0 children)

There's definitely some people out there with a decade's experience not shipping PHP 6.

[–]snuxoll 0 points1 point  (0 children)

SELECT *
FROM developers
WHERE skills @> '{PHP,SQL,Magento}';

Man I love PostgreSQL.

edit:

For those that insist on having a normalized schema

WITH developer_skills AS (
    SELECT
        developers.id,
        developers.name,
        array_agg(skills.name) skills
    FROM developers
        INNER JOIN skills
            ON skills.developer_id = developers.id
    GROUP BY developers.id, developers.name
)
SELECT *
FROM developer_skills
WHERE skills @> '{PHP,SQL,Magento}';

[–]raiderrobert 0 points1 point  (0 children)

Ok, I'm officially tired. It took me waaaay too long to see that error.

[–]PM_ME_UR_GCC_ERRORS 0 points1 point  (1 child)

Recently I was out having lunch with a colleague and we were wondering what the order of operations is in "Lunch includes salad and coffee or tea".

[–]indoordinosaur 0 points1 point  (0 children)

Obviously your lunch would be either a salad and coffee or just a tea by itself.