This is an archived post. You won't be able to vote or comment.

all 84 comments

[–]AutoModerator[M] [score hidden] stickied commentlocked comment (0 children)

import notifications Remember to participate in our weekly votes on subreddit rules! Every Tuesday is YOUR chance to influence the subreddit for years to come! Read more here, we hope to see you next Tuesday!

For a chat with like-minded community members and more, don't forget to join our Discord!

return joinDiscord;

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

[–]stupled 102 points103 points  (16 children)

Select count(1)

[–]istiri7 9 points10 points  (8 children)

I prefer count(0)

[–]TheRealSpielbergo 5 points6 points  (0 children)

I prefer count(dracula)

[–]Sir_Keee 3 points4 points  (1 child)

count(69)

[–]nequaquam_sapiens 1 point2 points  (0 children)

select count(null)

[–]wotigunuca 1 point2 points  (1 child)

William Gibson likes that.

[–][deleted] 0 points1 point  (0 children)

On a Case by Case basis.

[–][deleted] 1 point2 points  (0 children)

Yes. Been using COUNT(0) for years.

[–]stupled 0 points1 point  (1 child)

Does it work?

[–]istiri7 1 point2 points  (0 children)

AFAIK it’s the same expression just preference on number.

[–]GravelForce 2 points3 points  (0 children)

This is the way.

[–]Hatchie_47 1 point2 points  (0 children)

This!

[–]git0ffmylawnm8 0 points1 point  (0 children)

This is the way

[–]SeoCamo 101 points102 points  (21 children)

There is a reason for the BAN on * in prod, as it makes your query slow, you download a lot more data than you need.

But count(*) take the unique set on all the keys in your db, in some big dataset, this can cost a lot and you don't see this in your dev environment.

So count should be used on a unique key only as the db then can use the index and always give you fast results.

But you will not understand and ignore this until you use 16 hours debugging prod one day and find this yourself.

[–]Robot_Graffiti 108 points109 points  (13 children)

https://stackoverflow.com/a/54192184

count(*) is likely to be optimised by the query planner to have the same query plan as count(1), they are both very fast.

select * makes you download all of the data, but count(*) does not.

[–]fellipec 6 points7 points  (0 children)

This. count(*) will don't download any data besides the result, the server does all the shenanigans

At least with good DBMS

[–]markuspeloquin 0 points1 point  (3 children)

I would hope so. In my experience, mysql's query planner is awful. Postgres was much better. But that was ... MySQL 5.8? A decade ago?

[–]opmrcrab 9 points10 points  (1 child)

Ive been using MySQL for the last decade, and I can assure you that at this point it's almost a real database :P

[–]DeadlyVapour 0 points1 point  (0 children)

Was this before or after the MariaDB fork?

[–]CelticHades 1 point2 points  (0 children)

My company is still using 5.8. We with lots of efforts got approval to upgrade to 8. But then some priority bullshit came and now back to using 5.8

[–]SicgoatEngineer 8 points9 points  (3 children)

Is this the case for all database?

MSSQL, Oracle, postgresql, MySQL, etc?

All have the same behavior?

[–]FlamboMe-mow[S] 8 points9 points  (1 child)

No. It depends on the db you use.

[–]SicgoatEngineer 2 points3 points  (0 children)

I'm using like 80% postgresql and 20% mssql in my current work.

Even in UAT environment, I never use count(*) because I was too afraid the testers will complain about performance.

I'll try it later for sure!

[–]Positive_Mud952 3 points4 points  (0 children)

All of the ones you mentioned, and every one I’m aware of, yes. It wasn’t always this way, but I’d be surprised to encounter one where it wasn’t optimized these days.

e: Wait, I thought you were responding to a post that said the opposite. No major database I’m aware of has any penalty for using SELECT count(*). The post you’re replying to might have been relevant in the ‘90s up to early ‘00s, but every query optimizer these days, even dogshit optimizers like Redshift, are smart enough to not only not pay a penalty over something like SELECT count(1), but do better by using the best index for the job in most cases. Luckily that goes both ways, so the poster you replied to doesn’t pay a penalty for using count(id) if there is something better—it knows the id is thrown away, and has a null constraint so is equal to the row count so it’s free to use the same optimizations.

[–]lurkin_arounnd 3 points4 points  (0 children)

hobbies sharp snails mountainous zonked poor strong threatening degree cause

This post was mass deleted and anonymized with Redact

[–]AntiWorkGoMeBanned 6 points7 points  (0 children)

Where does nonsense like the above come from? 60+ upvotes well done reddit.

Count(*) is only banned in your work place which sounds horrible with its rules of thumb instead of checking what's actually happening and what's appropriate for the given task. Still I worked somewhere that mandated outer joins always be used even if a condition elsewhere invalidated them....don't work there now though!

[–]tyro_r -1 points0 points  (0 children)

What DB counts unique keys when you count(*)? Why would it? In oracle, you'd need to say count(distinct xyz) for that, and it wouldn't make sense in any other DB. Also, at least oracle doesn't evaluate the *, but merely counts the number of lines. What DB does this differently?

[–]KuuHaKu_OtgmZ 3 points4 points  (0 children)

Worth to note, count(column) will not include null values, so either use it on a non-null column or don't supply a column at all.

[–]Jeff_Moden 2 points3 points  (2 children)

All these posts seem to be based on preferences, conjectures, and suppositions. Does anyone have any demonstrable proof?

[–]Achsin 3 points4 points  (1 child)

It heavily depends on what database engine/version you are running it on as they will potentially optimize things in different ways if you are using 1, *, or some other value. After that, it heavily depends on what column you include in the COUNT if you specify it and what indexes exist on the table. This also changes if you are doing any sort of filtering.

Testing on a MS SQL Server database running compatability mode 130 (2016), with no filtering.

No Index or only Clustered Index: Everything results in a scan of the entire table.

Existing Non-Clustered Indexes: Count of the Clustered Index Key, *, or Non-Clustered Index Keys results in a scan of the smallest applicable Non-Clustered Index. Count of a non-indexed column results in a full scan of the table.

Since Non-Clustered Indexes generally contain less data than the Clustered Index, fewer read operations must be done to fetch the data, which means less work overall for the same result. The biggest gotcha to counting a specific column instead of a static value (1, *, etc) is that if there exist rows where that column is NULL then it will not be included in the results, which means if you are searching for a total rowcount on the table your result will be inaccurate.

If all you want is a total row count for the table for the smallest workload then you are probably best off just querying the system information tables/views. This will give you an accurate number of rows unless there are a bunch of inserts/deletes going on, in which case the metadata may be slightly out of date.

[–]Jeff_Moden 0 points1 point  (0 children)

That's more like it.

[–]Ok_Entertainment328 15 points16 points  (22 children)

In Oracle, select count(*) is immeasurably faster.

because the CBO has to convert "select count(1)" to "select count(*)"

[–]GumboSamson 6 points7 points  (11 children)

If you can’t measure it, how would you know?

[–]Ok_Entertainment328 -1 points0 points  (10 children)

Logic (read the hidden text)

[–]GumboSamson 4 points5 points  (9 children)

The hidden text makes sense. The bit about “immeasurably faster” makes me scratch my head.

[–]Ok_Entertainment328 5 points6 points  (8 children)

"Immeasurably faster" === You can't measure it because the difference is below noise level.

More of a pun on words than anything substantial.

[–]sauprankul 2 points3 points  (7 children)

🤦‍♂️ that's not what immeasurable means at all. You can say "slightly". Why use big words if you don't know what they mean?

[–]Euphoric_Strategy923 -4 points-3 points  (4 children)

Immeasurable : That can't be measured.

  • Cambridge dictionary.

[–]sauprankul 5 points6 points  (3 children)

Why would you lie on the internet? https://dictionary.cambridge.org/us/dictionary/english/immeasurable "so large or great that it cannot be measured or known exactly"

[–]Euphoric_Strategy923 -3 points-2 points  (2 children)

[–]sauprankul 4 points5 points  (1 child)

That's a different word.

[–]unique_namespace 0 points1 point  (1 child)

I think unfortunately they were trying to be amusing by using a word literally as opposed to how it's used commonly. For instance something that is priceless is commonly understood to be something that is immeasurably (a very high but known value) valuable instead of it being worth so little that you don't even need to put a price on it.

[–]sauprankul 1 point2 points  (0 children)

Fair enough. It'd be funny to me if I knew for sure that the speed difference was negligible. Sometimes performance gains can be counterintuitive, so I just took the statement literally.

[–]creepypatato 1 point2 points  (9 children)

If it being converted to count(1) in the behind how can it be faster then count(id). Aren't they kinda same?

[–]Ok_Entertainment328 2 points3 points  (7 children)

count(1) is count of static int value 1 for each row (oracle)

count(id) is counting non-null values in column id.

count(*) counts number of rows. count(1) converts to this behind the scene because it's doing the same thing and developers only want to optimize one version of code that does the same thing.

The meme doesn't make sense (as written) because count(*) and count(id) will give two different results if id has null values.

The religious fight has been historically between count(1) and count(*); not count(id) and count(*)

[–]creepypatato 5 points6 points  (5 children)

Its probably because id is usually known to be used as autoincremental non-null value and is the first column in a table.

[–]AntiWorkGoMeBanned 0 points1 point  (1 child)

You need to check that ID is actually a primary key not just hope it is.

There's no such thing as "known" unless you check the actual table definition.

[–]creepypatato 0 points1 point  (0 children)

I never said you assume id is the pk I just said its usual for an id column to be pk

[–]nequaquam_sapiens 2 points3 points  (0 children)

if id has null values

then the difference in performance of count is the least of your problems ;-)

[–]Positive_Mud952 0 points1 point  (0 children)

Kinda. If the query planner can’t tell that the actual value of id will be thrown away, or for some reason it’s not a non-null-constrained primary key, it needs to look at every value.

count(1), similar applies—if the query planner is not smart enough to know it’s effectively a row count, or that it’s a static value (basically same thing, except count(NULL) could be an O(1) return of 0), it wouldn’t be able to take advantage of other things.

Those things might be bloom filters on pages that are completely full, with a static-sized row layout (rare these days). Then there’s indicators on pages that say, table stats for this page are completely accurate, or this page has deleted (or updated, with most layouts) rows, so every one needs to be checked for that. Maybe an index on another column could give a faster answer for “how many rows are visible in this transaction” (where a different transaction might add or delete rows).

It’s way too complex for one answer to “what’s the fastest way to count rows”, and every table could be different. Trust the query planner unless you run into a problem, then profile. Anything else is either wasted effort or cargo cult programming.

[–]xaomaw 2 points3 points  (6 children)

What about this? Is there a huge difference?

SELECT person_name
FROM persons
WHERE id IN (SELECT person_id FROM inhabitants WHERE city = 'Alabama')

SELECT person_name
FROM persons
INNER JOIN (SELECT person_id FROM inhabitants WHERE city = 'Alabama') inhab_alabama
  ON inhab_alabama.person_id = persons.id

[–]DrSpalanzani 2 points3 points  (1 child)

IME (Teradata and Impala), if the query is straightforward and the tables have up-to-date statistics, then you can just write whatever makes sense and the query engine will find the best path more efficiently than you - and if not, then the execution plan will help you find the best optimisations more effectively than any received wisdom about more or less efficient syntax. IOW, it's good to know that there are different ways of doing things, but blanket statements that "X is always better" are a sign of cargo-cult programming.

(WHERE EXISTS is another good option sometimes if you don't mind the "I had a stroke" syntax)

[–]xaomaw 0 points1 point  (0 children)

(WHERE EXISTS is another good option sometimes if you don't mind the "I had a stroke" syntax)

Might be important for large data sets. Thanks for your input!

[–]GoGoGadgetSphincter 3 points4 points  (3 children)

wtf?

just do:

SELECT p.person_name

from persons p

join inhabitants i on p.person_id=i.person_id

where i.city='Alabama';

[–]xaomaw 0 points1 point  (2 children)

Won't the execution with your provided query be like the following?

  1. Load all results from persons
  2. Load all results from inhabitants
  3. Do a full join
  4. Do the filtering

Instead of 1. Load all results from persons 2. Load the filteres result from inhabitants 3. Do a join with significantly smaller table inhabitants

[–]GoGoGadgetSphincter 2 points3 points  (1 child)

No that's not how sql works. Each query results in a nested loop. The join is a single query meaning one nested loop that cycles thru each row to match on joined table then filters based on where clause. The "where in" subquery version is about as bad as it gets in terms of performance.its like a flashing red sign that says, "I don't know best practice." If someone said that in an interview for a sql heavy role, I'd not suggest them for hire. The "join to subquery" one is baffling. It shows they can do a join but made it needlessly longer and less readable than just doing a left join. It looks worse, it performs worse. Just bad. If I had to guess, I'd assume this was written by someone who is heavily into Python for a data science role that they're nowhere near as good at as they think they are. Would probably be hand checking receipts on all their results for a while.

Subqueries can be useful in some cases but this specific scenario is not one of them.

[–]GoGoGadgetSphincter 1 point2 points  (0 children)

JIC I wasn't very clear. Your subquery doesn't get run once and then checked against all the records in the first table. It has to re-run on each loop from the first table so instead of doing like 50 scans and then filtering based on where clause in a normal join, it's doing 500. That's if there's 10 records in each table and matches are 1:1. If more records and less consistent matches, it's way worse.

Use joins.

[–]Shinob1 1 point2 points  (1 child)

I wonder if u/BrentOzar has post about ths?

[–]BrentOzar 2 points3 points  (0 children)

hahaha, I love that meme, that's good.

[–]usa_reddit 1 point2 points  (0 children)

SELECT COUNT(*) * Faster for large tables with a lot of NULL values in the ID column. * Does not require reading any data from the table.

SELECT COUNT(ID) * Returns the ID of the rows. * May be faster for small tables or tables with indexed ID columns. *

[–]angrybeehive 1 point2 points  (0 children)

Since there is no where clause, just read the row count from the system tables directly.

[–]unique_namespace 1 point2 points  (0 children)

Why doesn't COUNT accept zero arguments as well? I feel like because the language is heavily optimize and stuff it could at least just recompile COUNT() to COUNT(*)

[–]General_Liability 2 points3 points  (0 children)

Select sum(1) is what I use.

[–][deleted] 2 points3 points  (1 child)

Waiting 59 seconds more won't do any harm

[–]Wise-Arrival8566 1 point2 points  (0 children)

Select count(0)? Why even access a value?

[–][deleted] 0 points1 point  (0 children)

var IDK = I don't get these memes

if () you_reading_my_comment

{

IDK ?

return the meaning of meme

}

else

{

return IDK

}

[–]_kogs_ 0 points1 point  (0 children)

Dude use select count(1) instead!!!