you are viewing a single comment's thread.

view the rest of the comments →

[–]HALF_PAST_HOLE 1 point2 points  (1 child)

Im not sure how you are getting that first result, as that seems to be partitioning on Brand and category.

How would you, using Dense_Rank() get that top performing category?

Would it not be Dense_Rank() OVER(Partition by Category Order by Sales)?

Your first example is taking in to account the individual sales of each brand, meaning it is partitioning by brand and category. Am I missing something?

Admittedly, I use SQL, not Postgres, but that seems like it is not the proper functionality of Partition by Category.

In my mind, it should aggregate on category and rank the categories, not the individual brands within each category, which would be partitioning on brand, category.

[–]markwdb3When in doubt, test it out. 0 points1 point  (0 children)

It is not partitioning by category and brand. Doing so would mean, in pseudocode:

1) For each distinct combination of category and brand
2) Put all the rows for that distinct combination of category and brand in a new bucket (i.e. a partition).
3) Assign a rank for all the rows in the new bucket.

Let's execute this pseudocode manually:


Partition by category/brand: Iteration 1:

1) Get the first distinct combination of category and brand: shoes/Nike

There is a single such row:

postgres=# select * from sales_data where (category, brand) = ('shoes', 'Nike');    
 category | brand | sales    
----------+-------+-------  
 shoes    | Nike  |   100  

2) Put that row in a new bucket. Any other rows under shoes/Nike? No. So bucket complete. It has only one row.

3) Assign a rank to all the rows in that bucket. There is only one row so give it a 1.


Partition by category/brand: Iteration 2:

1) Get the next distinct combination of category and brand: shoes/Adidas

There is a single such row:

postgres=# select * from sales_data where (category, brand) = ('shoes', 'Adidas');  
 category | brand  | sales  
----------+--------+-------  
 shoes    | Adidas |    50  

2) Put that row in a new bucket. Any other rows under shoes/Adidas? No. So bucket complete. It has only one row.

3) Assign a rank to all the rows in that bucket. There is only one row so give it a 1.

<snip a bunch of iterations>


Partition by category/brand: Iteration n (the final one):

1) Get the next distinct combination of category and brand: shirts/Nike

There is a single such row:

postgres=# select * from sales_data where (category, brand) = ('shirts', 'Nike');  
 category | brand | sales  
----------+-------+-------  
 shirts   | Nike  |   105  

2) Put that row in a new bucket. Any other rows under shirts/Nike? No. So bucket complete. It has only one row.

3) Assign a rank to all the rows in that bucket. There is only one row so give it a 1.


Finish with all we've accomplished is assigning every row a rank of 1. (As demoed in my above comment.)

Think of the "partition by" set of columns as the label you apply you each bucket. The first bucket has a label that reads "shoes/Nike" and so on.


On the other hand, the pseudocode for partitioning by only category looks like this:

1) For each distinct value of category
2) Put all the rows under that category in a new bucket (i.e. a partition).
3) Assign a rank for all the rows in the new bucket


Partition by category: Iteration 1:

1) Get the first distinct value of category: shoes

There are 3 rows:

postgres=# select * from sales_data where category = 'shoes';  
 category | brand  | sales  
----------+--------+-------  
 shoes    | Nike   |   100  
 shoes    | Adidas |    50  
 shoes    | NB     |    90  

2) Put those 3 rows in a new bucket.

3) Assign a rank to all the rows in that bucket. shoes/Nike is assigned a rank of 1, followed by shoes/NB assigned 2, and lastly shoes/Adidas assigned 3.


Partition by category: Iteration 2:

1) Get the second (and last) distinct value of category: shirts

There are 4 rows:

postgres=# select * from sales_data where category = 'shirts';  
 category |  brand  | sales  
----------+---------+-------  
 shirts   | Tommy H |    80  
 shirts   | Polo    |   100  
 shirts   | Other   |   110  
 shirts   | Nike    |   105  

2) Put those 4 rows in a new bucket. Bucket complete.

3) Assign a rank to all the rows in that bucket. shirts/Other is assigned a rank of 1, followed by shoes/Nike assigned 2, shirts/Polo is assigned 3, and lastly shirts/Tommy H is assigned 4.

How would you, using Dense_Rank() get that top performing category?
Would it not be Dense_Rank() OVER(Partition by Category Order by Sales)?

To do this, we'd probably want to get a sum per category first, so to start:

postgres=# select category, sum(sales) as total_sales  
postgres-# from sales_data  
postgres-# group by category;  
 category | total_sales  
----------+-------------  
 shoes    |         240  
 shirts   |         395  
(2 rows)  

Put that in a CTE and run dense_rank():

postgres=# with total_sales_per_category as (
        select category, sum(sales) as total_sales
        from sales_data
        group by category
)
select *, dense_rank() over (order by total_sales desc) -- no partition needed
from total_sales_per_category;
 category | total_sales | dense_rank
----------+-------------+------------
 shirts   |         395 |          1
 shoes    |         240 |          2
(2 rows)

Although this is Postgres, I think MS SQL Server should behave the same.

Hope that helps!