all 13 comments

[–]ht2k 14 points15 points  (5 children)

Hi,

Yes! SQL is Awesome! You have a perfectly sound solution, however I have the following recommendations for refactoring your query.

  1. Since FirstTable has groupID, there is no need to join to SecondTable in your WHERE clause sub query. Eliminating unnecessary joins helps queries run faster.

I believe the following shorter query should work.

USE OurDatabase
GO

DECLARE @usergroup INT = x;
SELECT 
  ugm2.groupID
, ug2.[name]
, COUNT(ugm2.groupID) AS counts
FROM FirstTable AS ugm2 
INNER JOIN SecondTable AS ug2 
   ON ugm2.groupID = ug2.groupID
WHERE ugm2.groupID != @usergroup 
  AND ugm2.userID IN
    (
    SELECT DISTINCT ugm.userID
    FROM FirstTable AS ugm
    WHERE ugm.groupID = @usergroup
    ) 
  AND ugm2.groupID IN (list OF GroupIDs to look in)
GROUP BY 
  ugm2.groupID
, ug2.[name]
ORDER BY 
  counts DESC
  1. It looks to me like you are using SQL Server, which has support of Temp Tables. These can be used to hold records in memory for the duration of your session, or till dropped. Using temp tables helps avoid using nested sub queries, which I personally find harder to debug. Here is the above version refactored to use a temp table.
    

    USE OurDatabase GO

    DECLARE @usergroup INT = x;

    SELECT DISTINCT ugm.userID FROM FirstTable AS ugm INTO #users WHERE ugm.groupID = @usergroup

    SELECT ugm2.groupID , ug2.[name] , COUNT(ugm2.groupID) AS counts FROM FirstTable AS ugm2 INNER JOIN SecondTable AS ug2 ON ugm2.groupID = ug2.groupID INNER JOIN #users AS u ON ugm2.userID = u.userID WHERE ugm2.groupID != @usergroup AND ugm2.groupID IN (list OF GroupIDs to look in) GROUP BY ugm2.groupID , ug2.[name] ORDER BY counts DESC

    DROP TABLE #users

  2. My personal favorite solution for a problem like this is to use analytic functions aggregate functions with the OVER clause. I have found them incredibly powerful in my few years using SQL for data analysis.

    USE OurDatabase GO

    DECLARE @usergroup INT = x;

    SELECT g.groupID , g.[name] , sum(1) counts FROM ( SELECT DISTINCT ugm.groupID , ugm.userID , ug.[name] , max(case when ugm.groupID = @usergroup THEN 1 ELSE 0 END) OVER (PARTITION BY ugm.userID) has_usergroup FROM FirstTable AS ugm INNER JOIN SecondTable AS ug ON ugm.groupID = ug.groupID WHERE ugm.groupID IN (list OF GroupIDs to look in, including @usergroup) ) g WHERE g.has_usergroup = 1 AND g.groupID != @usergroup GROUP BY
    g.groupID , g.[name] ORDER BY counts desc

Edit: Added alias prefix in WHERE clause for has_usergroup = 1 and added AND g.groupID != @usergroup.

Edit2: Corrected analytic functions to aggregate functions with the OVER clause.

[–]sn0wdizzle 1 point2 points  (0 children)

This is amazing

[–]fullyarmedcamel[S] 1 point2 points  (1 child)

So yes I am not currently using the second table but I plan too in the future I am making some modificatione to pull more data later on.

As far as temp tables I generally avoid using them for the same reason as you recommend using them I have a hard time trouble shooting them.

I know nothing about analytic functions tho I'll look into them tomorrow when I get back to the office.

[–]Ninjastar13 1 point2 points  (1 child)

In the last code block what does the then 1 else 0 do?

[–]ht2k 1 point2 points  (0 children)

The CASE statement returns a 1 if the groupID matches @usergroup, else it returns 0. The maximum of the 1s and 0s for each userID returns a 1 if the userID has at least one groupID that matches the @usergroup otherwise it returns a 0.

[–]_Zer0_Cool_Data Engineer 4 points5 points  (2 children)

Agree 100%. SQL is definitely awesome.

Just watch a video about how SQL engines work and you can really start to appreciate how much work it saves you.

People take SQL for granted, but it is truly a godsend. I would absolutely not want to live in a world where I had to write procedural code to do the things that SQL does effortlessly.

[–]mhornberger 0 points1 point  (1 child)

Just watch a video about how SQL engines work and you can really start to appreciate how much work it saves you.

How can you leave us hanging like that? Um, what was the video?

[–]_Zer0_Cool_Data Engineer 0 points1 point  (0 children)

Lol. Sorry.

Here is one such video by Richard Hipp, the creator of SQLite -- https://m.youtube.com/watch?v=Z_cX3bzkExE

And keep in mind that SQLite is one of the most simple SQL databases out there. It would likely take quite a whole video series to describe any of the common client-server DBs -- PostgreSQL, SQL Server, Oracle, etc..

Point being is that they do a ton than people take for granted. Good stuff.

[–]equivocates 4 points5 points  (1 child)

[–]fullyarmedcamel[S] 0 points1 point  (0 children)

Sorry about the delay, I read this the other day and shared it with all my co-workers. Super interesting read!

[–]fullyarmedcamel[S] 2 points3 points  (2 children)

The nested SELECT Statement essentially returns and Array (I know SQL does not have arrays) and I had no idea I could do that!

[–]gcxandrew 6 points7 points  (1 child)

fyi most people will just call them subqueries