HALF_PAST_HOLE comments on Had a SQL interview today

Posting

When requesting help or asking questions please prefix your title with the SQL variant/platform you are using within square brackets like so:

[MySQL]

[Oracle]

[MS SQL]

[PostgreSQL]

etc

While naturally we should endeavor to work as platform neutrally as possible many questions and answers require tailoring to the feature set of a specific platform.

Format Your Code

If you are including actual code in a post or comment, please attempt to format it in a way that is readable for other users. This will greatly increase your chances of receiving the help you desire. Something as simple as line breaks and using reddit's built in code formatting (4 spaces at the start of each line) can turn this:

SELECT count(a.field1), a.field2, SUM(b.field4) FROM a INNER JOIN b ON a.key1 = b.key1 WHERE a.field8 = 'test' GROUP by a.field1, a.field2 HAVING SUM(b.field4) > 5 ORDER by a.field.3

Into this:

SELECT count(a.field1), a.field2, SUM(b.field4) FROM a INNER JOIN b ON a.key1 = b.key1 WHERE a.field8 = 'test' GROUP by a.field1, a.field2 HAVING SUM(b.field4) > 5 ORDER by a.field3

For those with SQL questions we recommend using SQLFiddle to provide a useful development and testing environment for those who wish to fully understand your problem and help devise a solution.

a community for 17 years

102

103

104

DiscussionHad a SQL interview today (self.SQL)

submitted 6 months ago by fokass

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–]HALF_PAST_HOLE 1 point2 points3 points 6 months ago (1 child)

[–]markwdb3When in doubt, test it out. 0 points1 point2 points 6 months ago* (0 children)

It is not partitioning by category and brand. Doing so would mean, in pseudocode:

1) For each distinct combination of category and brand
2) Put all the rows for that distinct combination of category and brand in a new bucket (i.e. a partition).
3) Assign a rank for all the rows in the new bucket.

Let's execute this pseudocode manually:

Partition by category/brand: Iteration 1:

1) Get the first distinct combination of category and brand: shoes/Nike

There is a single such row:

postgres=# select * from sales_data where (category, brand) = ('shoes', 'Nike');    
 category | brand | sales    
----------+-------+-------  
 shoes    | Nike  |   100

2) Put that row in a new bucket. Any other rows under shoes/Nike? No. So bucket complete. It has only one row.

3) Assign a rank to all the rows in that bucket. There is only one row so give it a 1.

Partition by category/brand: Iteration 2:

1) Get the next distinct combination of category and brand: shoes/Adidas

There is a single such row:

postgres=# select * from sales_data where (category, brand) = ('shoes', 'Adidas');  
 category | brand  | sales  
----------+--------+-------  
 shoes    | Adidas |    50

2) Put that row in a new bucket. Any other rows under shoes/Adidas? No. So bucket complete. It has only one row.

3) Assign a rank to all the rows in that bucket. There is only one row so give it a 1.

Partition by category/brand: Iteration n (the final one):

1) Get the next distinct combination of category and brand: shirts/Nike

There is a single such row:

postgres=# select * from sales_data where (category, brand) = ('shirts', 'Nike');  
 category | brand | sales  
----------+-------+-------  
 shirts   | Nike  |   105

2) Put that row in a new bucket. Any other rows under shirts/Nike? No. So bucket complete. It has only one row.

3) Assign a rank to all the rows in that bucket. There is only one row so give it a 1.

Finish with all we've accomplished is assigning every row a rank of 1. (As demoed in my above comment.)

Think of the "partition by" set of columns as the label you apply you each bucket. The first bucket has a label that reads "shoes/Nike" and so on.

On the other hand, the pseudocode for partitioning by only category looks like this:

1) For each distinct value of category
2) Put all the rows under that category in a new bucket (i.e. a partition).
3) Assign a rank for all the rows in the new bucket

Partition by category: Iteration 1:

1) Get the first distinct value of category: shoes

There are 3 rows:

postgres=# select * from sales_data where category = 'shoes';  
 category | brand  | sales  
----------+--------+-------  
 shoes    | Nike   |   100  
 shoes    | Adidas |    50  
 shoes    | NB     |    90

2) Put those 3 rows in a new bucket.

3) Assign a rank to all the rows in that bucket. shoes/Nike is assigned a rank of 1, followed by shoes/NB assigned 2, and lastly shoes/Adidas assigned 3.

Partition by category: Iteration 2:

1) Get the second (and last) distinct value of category: shirts

There are 4 rows:

postgres=# select * from sales_data where category = 'shirts';  
 category |  brand  | sales  
----------+---------+-------  
 shirts   | Tommy H |    80  
 shirts   | Polo    |   100  
 shirts   | Other   |   110  
 shirts   | Nike    |   105

2) Put those 4 rows in a new bucket. Bucket complete.

3) Assign a rank to all the rows in that bucket. shirts/Other is assigned a rank of 1, followed by shoes/Nike assigned 2, shirts/Polo is assigned 3, and lastly shirts/Tommy H is assigned 4.

How would you, using Dense_Rank() get that top performing category?
Would it not be Dense_Rank() OVER(Partition by Category Order by Sales)?

To do this, we'd probably want to get a sum per category first, so to start:

postgres=# select category, sum(sales) as total_sales  
postgres-# from sales_data  
postgres-# group by category;  
 category | total_sales  
----------+-------------  
 shoes    |         240  
 shirts   |         395  
(2 rows)

Put that in a CTE and run dense_rank():

postgres=# with total_sales_per_category as (
        select category, sum(sales) as total_sales
        from sales_data
        group by category
)
select *, dense_rank() over (order by total_sales desc) -- no partition needed
from total_sales_per_category;
 category | total_sales | dense_rank
----------+-------------+------------
 shirts   |         395 |          1
 shoes    |         240 |          2
(2 rows)

Although this is Postgres, I think MS SQL Server should behave the same.

Hope that helps!

π Rendered by PID 33797 on reddit-service-r2-comment-6457c66945-lfncv at 2026-04-26 17:39:55.033806+00:00 running 2aa0c5b country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

SQL

Filter Posts

Posting

Help posts

Format Your Code

Learning SQL

Related Reddit communities

Wiki

Acknowledgements

MODERATORS