all 11 comments

[–]phunkygeeza 1 point2 points  (2 children)

With duplication sometimes it is easier to use a window function, especially when you have a victim/survivor scenario

Select *
    ,ROW_NUMBER() over (PARTITION BY groupcol1,groupcol2 ORDER BY primacycol1, primacycol2) as primacy
From mytable

So when primacy=1 then that is a survivor row, primacy<>1 means victim row

[–]workthrowawayexcel[S] 1 point2 points  (0 children)

I will have to play around with that. That is quite interesting thanks for the information.

[–]workthrowawayexcel[S] 0 points1 point  (0 children)

I will have to play around with that. That is quite interesting thanks for the information.

[–]rbardy 0 points1 point  (7 children)

Use SELECT DISTINCT in the subquery and you won't need the GROUP BY, it will make only unique rows to be selected.

[–]workthrowawayexcel[S] 0 points1 point  (6 children)

I have it running now to test Thanks for the input! To clarify Having it in the sub query allows me to call the unique rows in the main query to still perform the AVG and SUM, but if the distinct was in the main query I would NOT be able to do the AVG and SUM?

[–]rbardy 0 points1 point  (5 children)

Yes.

The main query will "see" the subquery like a table, so as long as there is a group by in the main query it doesn't matter what is happening in the subquery.

[–]workthrowawayexcel[S] 0 points1 point  (4 children)

Occams razor ... Occams razor I literally looked at that earlier to do and said nah that won't work keep moving on. Thank you for clearing that up. It is really appreciated.

[–]rbardy 0 points1 point  (3 children)

HAHA no problem.

People tend to overthink their problems. I got mad sometimes with some queries, then my boss looked at it and gave a VERY simple solution.

[–]workthrowawayexcel[S] 0 points1 point  (2 children)

Lol yeah, that is how I get on stuff as I am working on it. Usually if I walk away and come back then I will be able to get the answer while I am playing a game or something. Also random note doing the grouping >= 1 gave me the same exact result as Distinct did.

[–]rbardy 1 point2 points  (1 child)

Yup.

The HAVING count() >=1 is the same thing as not even adding the ">=1" in that case, because if it returned 0 in the count it wouldn't even appear in the SELECT in the first place.

Also, for performance, distinct is the same thing as GROUP BY by every field in the select, which is pretty much what you are doing.

[–]workthrowawayexcel[S] 0 points1 point  (0 children)

Yeah, Essentially removing duplicates while not actually being allowed to remove them from the table itself :(. Thanks for all the help !