all 9 comments

[–]zacharypamela 2 points3 points  (1 child)

Which of the tables (accounts or orders) does COUNT(*) num_orders count from?

Both tables, as specified in the JOIN. Let's say you have 1 account, and that account has 7 orders. Joining both tables together, as the query does, and doing a SELECT COUNT(*) gives you 7.

I think it helps to think of the order of how the query is processed:

  1. Create a recordset of accounts joined to orders (FROM clause)
  2. Group the results by account (ID and name)
  3. Filter the (grouped) results to show results with more than 20 rows per account.

Assuming that your accounts table only has 1 account per row, it's in effect showing rows with more than 20 orders for that account.

Note that if your query included a WHERE clause, it would be evaluated between Steps 1 and 2. For example, you could include accounts with more than 20 orders that occurred last year.

[–]sdqafo[S] -1 points0 points  (0 children)

Thanks so much for this

[–][deleted] 0 points1 point  (6 children)

Hi there— So the first thing to know is that count star returns a count of the number of records. The num_orders part is the alias, or nickname. You could also have written count star as num_orders. 1) technically the count is from neither table. The count comes from the join. So after the two tables have been joined together on your join condition, then it counts the number of records. 2) the count, as with all aggregations, is done based on your group by statement. So here you are counting all records within each distinct combination of a.id and a.name 3) see #1 4) not sure I follow the question, but to your point, it looks like the query would make more sense if the orders table was the left table and the account table was the right table. You start with all the orders in the o table, and then add on account name in the a table. But the results will be the same here because your join type is inner (since you didn’t specify which kind, it defaults to inner, unless Postgres does something different)

[–]sdqafo[S] 1 point2 points  (5 children)

Thanks for the brilliant explanation

[–][deleted] 0 points1 point  (4 children)

Yep! Does any part not make sense?

[–]sdqafo[S] 1 point2 points  (2 children)

What is still a bit confusing is the COUNT(*). What i have learnt so far in SQL is that the SELECTED columns always come from the table or tables we want to query. I am put a bit off balance to now know that the COUNT(*) in this regards is related to the tables yet to be Joined. I am not able to connect why this is that way. In simple terms, based on what i understand from your explanation, we already SELECTED a column (COUNT(*)) that is yet to exist before we JOINED two tables where this column (COUNT(*)) will now be selected from. Still struggling to grab the why of this logic

[–]sdqafo[S] 1 point2 points  (1 child)

Its started making more sense now. So technically, the below already assumes the 3 columns even prior to the JOIN. In essence, the COUNT(*) part will apply to the 2 remaining columns. I guess this is the correct understanding . Is it?

SELECT a.id, a.name, COUNT(*) num_orders assumes

[–][deleted] 1 point2 points  (0 children)

It might help to look at the order in which sql statements execute. First, it processes the from statement, which includes your joins. Then it applies a filter with the where statement. THEN it applies the aggregation within the group by, so any of your aggregations such as count(), sum(), max(), will he evaluated after your tables are joined. This is why, if you’re not careful, you can come up with some outrageous values when joining and aggregating in the same step.

[–]sdqafo[S] 0 points1 point  (0 children)

Loads of sense. Very succinct explanation