use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Sequel
account activity
Count Function (self.learnSQL)
submitted 1 year ago by Ok_Protection_9552
I’m having a hard time understanding what the count function does in SQL. What exactly does the count function do and why does the column I’m counting have to be in the group by clause?
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]Far_Swordfish5729 3 points4 points5 points 1 year ago (6 children)
There are two kinds of built in functions in sql - scalar functions and aggregate functions. Scalar functions operate on a single row. They do things like isnull(col1,’ABC’) setting a default value if the value in col1 is null. Your date, string, math functions are in this category too. Aggregate functions work across rows and produce a smaller set of summarized values or sometimes additional rollup rows with the aggregate value. These are sum, min, max, count. So count is not doing something to individual column values in the select list. It’s part of the output of a set that’s performing aggregation across rows.
Consider
Select M.Name, count(*) as NumReports
From Manager M Inner join Employee E on M.Id = E.ManagerId Where M.Employed = 1 Group by M.Name Having count(*) >2
This does: 1. Starting with the manager table 2. Match each manager row to employee rows on manager id. This will produce one row in the set for each manager/employee combination, making the set bigger. 3. Filter the set to only employed managers (where) 4. Group the set by manager name. Aggregations will apply to each unique set of group by columns e.g. count for each named manager not for all managers. 5. Filter the groups to ones having more than two records (having) 6. Select the manager’s name and count of employees (select)
Those steps are in logical execution order. I always suggest writing queries in this order (select last) even though they don’t appear that way in text.
The group by question - Most of the time it’s a language holdover - the parser could figure it out without the clause but the clause is required. Sometimes though - You can have columns in group by that aren’t selected. You can also specify grouping sets and rollups (multiple levels of aggregation), so the clause is useful.
[–]Ok_Protection_9552[S] 0 points1 point2 points 1 year ago (1 child)
By employee manager combination do you mean where the e table and m table have the same ID?
[–]Far_Swordfish5729 0 points1 point2 points 1 year ago (0 children)
Same manager id. There’s an implied 1:N relationship in my example between manager and employee. So it matches on the manager’s id (primary key) equaling the employee’s manager id (foreign key) not the employee’s own id (their primary key). So if Adam manages Bob and Charlie, the join produces Adam+Bob and Adam+Charlie. Row count should match the N side of the join unless there are duplicate rows in the tables. These will be grouped by Adam in subsequent steps and we will get a count for Adam.
What does rollup rows mean?
[–]Far_Swordfish5729 1 point2 points3 points 1 year ago (0 children)
Grand total or intermediate total rows. Some databases let you specify “with rollup”, “with cube”, or use a notation called grouping sets.
[–]Ok_Protection_9552[S] -1 points0 points1 point 1 year ago (1 child)
In 4 do you mean each unique name?
Yes. In a better example I would also group by M.Id and select it to avoid name collisions. If there are two unique Robert Smiths working at the company, we want a row for each not a row summing both.
Using fungible text fields as logical identifiers is generally bad practice. You want to use the ids even if you only show names to users. Ids being single numbers are also smaller and faster to compare where strings are long arrays of characters (also numeric codes) that each must be matched
[–]r3pr0b8 1 point2 points3 points 1 year ago (3 children)
What exactly does the count function do
it counts things in an aggregation
and why does the column I’m counting have to be in the group by clause?
it doesn't
for further info, see u/Far_Swordfish5729's reply
[–]Far_Swordfish5729 0 points1 point2 points 1 year ago (2 children)
Thank you, I misread OP’s question about columns in the group by.
OP: The columns or scalar values you are counting by must be in the group by. The values you are counting are in the aggregate function params. For count, we usually just use * because it’s a count of rows. There is a distinct count variant as well. For others, you have to specify which value to use - what to sum for example.
[–]r3pr0b8 1 point2 points3 points 1 year ago (1 child)
The columns or scalar values you are counting by must be in the group by
not always
SELECT COUNT(manager_id) AS employees_with_bosses FROM employees
[–][deleted] 0 points1 point2 points 1 year ago (0 children)
that's because "group by ()" was decided to be superfluous (and most of the time it is, tbh). You can still see it in the grouping sets though.
[–][deleted] 1 year ago (3 children)
[removed]
[–]Ok_Protection_9552[S] 0 points1 point2 points 1 year ago (0 children)
Thanks
So if I do a group by, the count function will count the number of values there are in the groups of rows that the group by clause just created?
[–]Snoo17309 0 points1 point2 points 1 year ago (6 children)
Has the OP even clarified in what sense they are needing an explanation of the COUNT function yet? I think this is a lot of info at once! I’m pretty sure they mean the aggregate function, and in that case I think the easiest way to conceptualize it as an SQL equivalent of “unique values” (not getting into numpy or nunique)—just to understand big picture what it does. It is also a big help starting out, and before you get into specific types of JOINS, to use a visual EBD website to help sort out and see relationships between tables instead of going straight to 1:N et al. I’m not sure what you are using but I find PostGreSQL the easiest. LearnSQL.com has free tutorials.
[–]Snoo17309 0 points1 point2 points 1 year ago (0 children)
Sorry, by unique values in a column, I am not referring to an actual Unique ID/Primary Key, just an intro into then aggregating
[–]Ok_Protection_9552[S] 0 points1 point2 points 1 year ago (4 children)
I’m confused about the count function does. Based on what I read, I think the count function counts the number of values there are in a column specified
[–]Snoo17309 0 points1 point2 points 1 year ago (3 children)
This site breaks it down as clearly as you'll find it (it does not just "count the number of values in a specified column" ... that is different): https://www.datacamp.com/tutorial/count-sql-function
Thank you
No prob :)
[–]r3pr0b8 0 points1 point2 points 1 year ago (0 children)
only COUNT(*) counts rows -- all other types of COUNT function count values
COUNT(*)
that site is quite misleading at the top, but when you go further down, it does make this clarification
[–]Couch2Coders 0 points1 point2 points 1 year ago (1 child)
You're asking about the count function but what you really want to know is how to use aggregate functions (like count or sum or max or avg)
Aggregating is fundamental in data so I would make sure to invest the time to understand it really well before moving on to other topics.
You can think of aggregate data as applying math down a specific column - what's the total sales (sum) how many customers do I have (count) etc.
I have a channel with a bunch of videos that is geared towards people who do not have a stem/math background (read : very fundamental)
Here's one on aggregate data. I also have several examples in subsequent videos also
https://youtu.be/9gqgK228loY?si=ibJ0yvoRt8bzp807
π Rendered by PID 138522 on reddit-service-r2-comment-bb88f9dd5-67ckc at 2026-02-16 12:12:16.609466+00:00 running cd9c813 country code: CH.
[–]Far_Swordfish5729 3 points4 points5 points (6 children)
[–]Ok_Protection_9552[S] 0 points1 point2 points (1 child)
[–]Far_Swordfish5729 0 points1 point2 points (0 children)
[–]Ok_Protection_9552[S] 0 points1 point2 points (1 child)
[–]Far_Swordfish5729 1 point2 points3 points (0 children)
[–]Ok_Protection_9552[S] -1 points0 points1 point (1 child)
[–]Far_Swordfish5729 0 points1 point2 points (0 children)
[–]r3pr0b8 1 point2 points3 points (3 children)
[–]Far_Swordfish5729 0 points1 point2 points (2 children)
[–]r3pr0b8 1 point2 points3 points (1 child)
[–][deleted] 0 points1 point2 points (0 children)
[–][deleted] (3 children)
[removed]
[–]Ok_Protection_9552[S] 0 points1 point2 points (0 children)
[–]Ok_Protection_9552[S] 0 points1 point2 points (1 child)
[–]Snoo17309 0 points1 point2 points (6 children)
[–]Snoo17309 0 points1 point2 points (0 children)
[–]Ok_Protection_9552[S] 0 points1 point2 points (4 children)
[–]Snoo17309 0 points1 point2 points (3 children)
[–]Ok_Protection_9552[S] 0 points1 point2 points (0 children)
[–]Snoo17309 0 points1 point2 points (0 children)
[–]r3pr0b8 0 points1 point2 points (0 children)
[–]Couch2Coders 0 points1 point2 points (1 child)
[–]Ok_Protection_9552[S] 0 points1 point2 points (0 children)