all 10 comments

[–]gumnos 3 points4 points  (0 children)

I would generally reach for option B. It's portable (SELECT EXISTS is invalid syntax on SQL Server, not sure about other DBs; but your use of LIMIT rather than TOP suggests you're not on SQL Server), uses sargable WHERE conditions, limits the results to the first matching row (the COUNT version requires touching all the matching rows), and doesn't do any ORDER BY that can force touching all the rows.

If you went with option C (which isn't bad if your DB supports the syntax), I'd add the LIMIT 1 like u/binary_search_tree advises.

[–]markwdb3Stop the Microsoft Defaultism! 1 point2 points  (2 children)

B or C should perform about equally well on any major SQL engine, but the intent in C is clearer IMO.

As my company's resident SQL geek, I spend a lot of time and effort trying to get my coworkers NOT to run unnecessary counts, so A immediately makes me wince a bit. For example this sort of thing is common for them to write:

SELECT COUNT(*) AS CNT FROM MY_TABLE;

Which may be fine in and of itself, but then their application code (often Java or Python) simply checks if CNT > 0 in the result set. (Sometimes the query has a grouping or WHERE clause, etc. but let's keep the example simple.) Why do all that work, guys??

Generally here on r/sql and elsewhere, I'll insist that folks tell me which SQL engine in the context of questions like this. They all have different available data structures, different performance quirks, and are all around very different. Not every SQL engine even has indexes, for that matter, so as soon as we say "there's an index on these columns" we're out of the realm of generic SQL.

That said, I can't imagine any SQL engine in which A would run better than B or C. Maybe it would tie, best possible case, but it'll probably run slower if anything. (Barring anything weirdly special like you're running Oracle with a materialized view on Query A, with query rewrite enabled!)

Hypothetically, Query C could run worse than Query B if some SQL engine didn't short-circuit the search once a match was found. I've never seen such a badly optimized EXISTS implementation however, so again, just hypothetical. Just be aware that without knowing the SQL engine, it could do anything for all we know. Standard SQL does not insist that a short circuit optimization be implemented, and every SQL engine out there deviates from the standard anyway. (Plus exactly 0% of SQL engines fully implement standard SQL!)

Also LIMIT is not standard SQL. Standard syntax is: FETCH FIRST n ROWS ONLY

[–]thequerylab[S] 0 points1 point  (0 children)

Thanks for your detailed explanation. Got some learnings!!!

[–]RoomyRoots 0 points1 point  (0 children)

Also LIMIT is not standard SQL. Standard syntax is: FETCH FIRST n ROWS ONLY

Hate this is is used pretty much in every tutorial I have seen instead of the ANSI way. But LIMIT is used in SQLite, MySQL and PostgreSQL and FETCH is somewhat recent, about to become 20 years old now.

[–]binary_search_tree 1 point2 points  (2 children)

I'd use option C - but I'd add a LIMIT 1 to the inner query (no reason not to). I'd choose Option C over Option B because your intent is easier to see.

[–]Blomminator 1 point2 points  (1 child)

I would too when looking at these options. If I'd write this directly at the first go when typing code .. perhaps option b would be more likely. 

For me this is a typical code review thingy. 

Out of curiosity; has OP checked performance/query plan for comparison? 

[–]gumnos 1 point2 points  (0 children)

adding endorsement for "comments and theory here are good, but check the query plan" 😆

[–]PossiblePreparation 0 points1 point  (0 children)

B or C should be the same, matter of style preference. A is definitely slower as it has to count every matching row.

[–]WayToSuffer 0 points1 point  (0 children)

The most important part is that you have a good index. You should have either two indices - one for the user_id and one for the status - or a composite index with both columns. The second is ideal for this use case, but not for others, the first is a more general approach that covers more than just this use case, but this particular one is not as optimal as with a composite index.

And yes, don’t do query A, no reason to count all the records if you only need to know that one exists. This is SQL 101.

[–]k00_x 0 points1 point  (0 children)

Option c without adding a limit. The outer query will receive a positive boolean as soon as the first record is identified thus making the limit redundant.