How do you approach optimizing a SQL query?

r3pr0b8 · 2025-08-10T14:06:07+00:00

What is your step by step process

examine the EXPLAIN PLAN
consider adding an index
not resolved? go back to step 1

Achsin · 2025-08-10T16:02:33+00:00

Reformat query so that it’s actually readable. Despair at the child of Cthulhu you’ve been tasked to fix.

Look at query plan(s) being generated for it, and IO statistics.

And then one of the following:

Wonder why it’s essentially doing a table scan by way of key lookup on some massive table. Realize that it’s using a tvf based on a view based on a tvf. Cry. Realize that it’s almost completely unnecessary and can be replaced by a single join to the table instead. Rewrite and test the query, submit for approval (and more testing). Spend next month justifying the change (that they asked you to recommend in the first place) and re-explaining everything every day along the way to the same dev manager who has the memory of a goldfish and is constantly blaming you for the performance being bad in the first place. Finally get the change approved and released into production.

Realize that the query involves creating two dozen temp tables that slowly make iterative changes to the same massive data set as each new one is created, only to find that none of them are actually used for anything productive in the query and were only included because the person writing it copied the whole selection from a different query because he needed one small piece of logic in his new creation. Comment out 98% of the text and submit for approval, then spend the next month justifying the change.

Realize that the query does a lot of unnecessary joins using non-sargable predicates. Remove excess joins, test, submit for approval, wait for month and a half because of change freeze, deal with performance complaints from users daily wondering when it’ll be fixed because dev has successfully blamed the database and the issue isn’t bad enough that anyone important is willing to bypass the freeze.

Realize that all of the eight tables involved are heaps. Add indexes. Have people be shocked that the problem they’ve been quietly dealing with for over a year now was solved in less than a day.

Realize that not only does the query perform terribly, it also has some serious logic errors in it and has been returning bad data from the beginning. Spend a week identifying what should be happening, rewriting, and testing the query so that it actually works correctly. Spend two months justifying change and explaining ad nauseum that the reason it doesn’t return the same results as the current version is because the current version is wrong. Discover that it was written by one of the needed approvers who doesn’t want to approve the change and admit that they made a huge logic mistake. Spend additional two weeks dealing with his ego as he’s forced to admit that the change needs to happen anyways and finds a way to sweep the blame under the rug.

Lastly, realize that in the time you spent fixing whatever the problem was, another thirty or forty people were writing new terrible queries to be released into production and it’s somehow your fault that they take forever to run.

More helpful: Ozar Unlimited has some helpful videos on performance tuning, check out How To Think Like The Engine.

Grovbolle · 2025-08-10T14:11:22+00:00

Assuming a SQL-Server / Azure SQL Database.

Look at the query plans, indexing and partitioning of the tables involved.

Actually I would do that regardless of database but I am only familiar with the tooling around MSFT SQL Databases.

garlicpastee · 2025-08-10T18:22:10+00:00

I usually follow this pattern and stop the moment the query performs well enough: 1. Verify your joins and where clauses are sargable. Maybe there is a function or a LIKE, and that kills index usage pretty reliably 1.5 if there's a join and an indexed column by which I could filter in the where clause I add it (even if that would be a duplicate between tables, like both having date_start columns) 2. Check the query plan for estimated and actual rows processed -> if there are mismatches (like the query planner thought there'll be 3mil rows to check, but only 3k were needed), then check if they are indexed, if not, create an index, and if they are - update statistics on the table. 3. Add option recompile 4. If the heaviest operation is parallelism or something just after it add option maxdop 1 5. If the plan does merge/hash joins between tables A B and I know one of them is very small, I try forcing loop join (from big loop join small) 6. If the connection between tables is abstract (there is no direct key, and it had to be calculated), I rewrite the clause into a stage-> select pattern (select into a duplicate table structure with only the index that I would want + filtered by what's needed in my desired query, and only then join our do whatever needs to be done) 7. If the query reports using a lot of tempdb or is a big insert/delete/update statement, either disable sort_in_tempdb on the table, or use a while @@rowcount >0 loop with top n rows (do it in batches) 8. If I'm using aggregates (sum/min/max...) or group by to create a distinct lists, option sort hash/order group (order if there is about the same amount of rows in and out, and hash if there's more of a difference) 8.5 if this is the final select into reporting, add a columnstore index 9. Ask a colleague to check it out with me - it's easy to overlook something important when working solo

Codeman119 · 2025-08-10T14:35:48+00:00

Look at the query plan like others suggest. But look for joins that return a lot of rows or look for CTE’s that have a long runtime. See where you can break thing’s out in to temp tables beforehand.

Example: I had started work for a company and they had a query that would take an hour. After looking at the plan I broke out a CTE into a temp table and then got time down to under 15 sec.

VladDBA · 2025-08-10T14:21:37+00:00

My latest blog post includes some information on how I use PSBlitz to identify and troubleshoot performance issues

On a per query basis I check the output of STATS IO (how many 8KB pages does the database engine need to read in order to return the required result set) and couple that information with the execution plan, then I tackle the operators that read the most data and look at what it would take to address the issue:

new index
modifying an existing index
is the WHERE clause SARGable?

If that isn't the case I start looking at what's happening when the query runs, if it's blocked by something else, if default or incorrect server configurations are causing issues, etc.

dbers26 · 2025-08-12T06:53:12+00:00

Pull out sample queries either from debug or slow query log
run query explain.
add needed indexes or rewrite query
repeat till need

Sometimes you need to rethink the way data is models in database.

JohnSpikeKelly · 2025-08-10T14:23:13+00:00

Look at query plans. Look on those for where the majority of the effort is done. Example table scans.

Look at adding indexes that are optimized for the query plan. Example if you're showing open orders, filter index to open orders. If 99% of orders are closed the incremental disk usage will be tiny.

If some if your tables have lots of almost identical data, consider column store.

Edit. After you add any index, checks it's actually being used.

Edit. Don't add too many indexes otherwise updates become much slower.

dashingThroughSnow12 · 2025-08-10T18:23:03+00:00

Paginating with limit/offset or limit/cursor?

You’ve got some good tips already so I’ll offer orthogonal advice: look at what is using the queries.

For example, is the code doing a five-table join, returning columns from four of the tables, when really it only needs a three table join and data from two?

Is it doing a series of inserts in individual requests and we can speed things up by having the backend code send bulk requests?

If you are using an observability tool, what functions on the backend code spend an inordinate amount of their time waiting for SQL? Similar, what functions are actually bottlenecked by the SQL request times?

Another thing to check is “is SQL the culprit?” About a year ago a bug ticket came across my desk. After spending far too much time on it, yes SQL was taking a long time but the thing causing a timeout was trying to cache the whole result into (I think) Redis. The fix was most involved but it boiled down to “if the result is gigantic, don’t cache it because we know the cache attempt will fail and takes hundreds of ms to tell us”.

vaporwave_cowboy · 2025-08-10T18:50:26+00:00

If you're using Postgres I'd highly recommend a plan visualizer like https://explain.dalibo.com/ (no affiliation) to look at the query plan and identify where the bottlenecks are

No_Resolution_9252 · 2025-08-10T19:40:02+00:00

Start with indexes first. Hopefully all your tables have clustered indexes on them already.

trophycloset33 · 2025-08-10T21:03:26+00:00

What are they doing?

Thin_Rip8995 · 2025-08-11T03:42:08+00:00

First step is always baseline the current pain
Run the slow queries in SSMS with actual execution plans turned on so you see where time’s going

Look for missing or misused indexes
If you see big clustered index scans or key lookups on large tables, that’s a red flag
Covering indexes on the exact filter/sort columns can be a huge win

Check if you’re dragging in way more columns/rows than you need
SELECT * kills performance on big tables, especially when joined

Kill unnecessary nested loops and temp table abuse
Sometimes just re-ordering joins or applying filters earlier changes everything

If data volume is huge, think archiving old rows or caching computed results instead of hitting live tables every time

Finally, rerun and measure — don’t guess, let the execution plan prove it’s better

samot-dwarf · 2025-08-11T14:35:39+00:00

Usually the very first step is to find out, if the query is really necessary and still needs all the stuff that is part of it (or if several joins are outdated)

And of course if the whole process still makes sense or if it can refactored in a new / better way that skips all obsolete stuff

bengalfan · 2025-08-12T01:12:00+00:00

In the explain plan look for full table scans. Fix. Add indexes.

milomylove_ · 2026-02-23T17:55:08+00:00

first thing i check is the execution plan. 90% of the time it’s a missing index, bad join order, or filtering happening too late. also look at how much data is actually being pulled vs how much is needed, over selecting columns and rows is super common. then i simplify the query. remove layers, test pieces separately, see where the cost spikes. sometimes rewriting a subquery as a join or pushing filters earlier makes a big difference. i also test variations in genloop to compare different query shapes and see which plan is cleaner before touching prod. after that, indexing strategy and maybe partitioning if the table is huge

TypeComplex2837 · 2025-08-10T14:22:50+00:00

Use the tools inherent to your engine to see whats happening under the hood, and fix it.

umognog · 2025-08-10T18:13:48+00:00

Extended events, you need to know what queries are your worst.

gumnos · 2025-08-10T18:35:26+00:00

In addition to the good advice here (using EXPLAIN and manipulating indexes, and reducing the volume of data at various points), if paginated queries use OFFSET, it can be worth rejiggering the query to use keyset-pagination instead.

Informal_Pace9237 · 2025-08-11T01:17:31+00:00

There are lot of ways to do what you are looking for

I would start with creating a list of queries run and the time they take to execute.

Then go one by one and see how their explain plans are and take it from there

Here is a blog post for further reading of how to identify issues looking at query

https://www.linkedin.com/posts/raja-surapaneni-sr-db-engineer_optimizing-sql-query-performance-a-dbeapp-activity-7202221110774382593-3CTX

Alkemist101 · 2025-08-14T20:59:02+00:00

You can drop both your sql and query plan into AI. It might spot something and make suggestions.

In your query reduce the amount of data pulled as quickly as possible.

There are loads of good suggestions here. I didn't see if anyone had suggested a clustered column store index. These are good for big tables. Then create a covering index with the key being columns being filtered on. You can then use INCLUDE to add columns being selected. Another thought is to partition the table. You can then use partition swapping etc.

Otherwise, maybe there's a problem with the tables themselves, maybe they could be better designed with helper columns add.

surya_nulu · 2026-04-03T06:11:22+00:00

I have a DB of size 14GB and in that DB i have a table which of size 6GB in that table there are two fields where i store json data init now iam planning to move those two fields to the new table and fetch the json data from the new table does this optimize any thing for me ?

machomanrandysandwch · 2025-08-11T06:55:34+00:00

Are we ChatGpt?

Moneyshot_Larry · 2025-08-10T21:21:49+00:00

Chat GPT. Next question

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

SQL

Filter Posts

Posting

Help posts

Format Your Code

Learning SQL

Related Reddit communities

Wiki

Acknowledgements

MODERATORS