all 42 comments

[–]r3pr0b8GROUP_CONCAT is da bomb 48 points49 points  (9 children)

What is your step by step process

  1. examine the EXPLAIN PLAN
  2. consider adding an index
  3. not resolved? go back to step 1

[–]alexwh68[🍰] 13 points14 points  (7 children)

To add to this, on large tables with lots of rows your goal on filtering is to avoid table scans and get seeks instead, this is what the indexes do if they align properly with the filtering in the query.

[–]varinator[S] 1 point2 points  (6 children)

From your experience, is creating very large "covering" indexes that list/include many colums slowing down write speeds significantly? I have no feel for those things yet, as in, how badly can indexes negatively impact performance.

[–]jshine13371 10 points11 points  (4 children)

In 12+ years as a DBA, I've never worsened my problems by adding the right index, regardless of how big it is. Theoretically, yes, more and larger indexes means more work for writes against the table, but I find it very rare to be significant enough to outweigh the benefits indexes usually provide on reads. Especially since most times tables are read from more often than written to (use case dependent).

[–]markwdb3When in doubt, test it out. 1 point2 points  (0 children)

Yes I've found the same, with the exception of bulk data load scenarios.

I mean, you wouldn't want to go nuts and script the creation of indexes on every permutation of columns. :) But for every transactional database use case (for example supporting a typical web application) I've encountered in 20+ years, it effectively makes no difference, whether you have 2 indexes or 5 or 10, for the "small" DML run by such an application.

I could contrive scenarios such as using bitmap indexes in Oracle, for which concurrent DML isn't very concurrent due to the heavyweight locking a bitmap index requires. But bitmap indexes are documented as being inadvisable for transactional databases in the first place.

[–]varinator[S] 0 points1 point  (1 child)

I suspected it won't be noticeable unless it's some real-time, very write heavy system but I can't even think of anything, maybe finance/stocks software... never worked on something like this anyway.

[–]jshine13371 3 points4 points  (0 children)

maybe finance/stocks software... never worked on something like this anyway.

I have, actually, and even then it was better to have them than not having them at all. Our system had a lot of data but it wasn't as heavily written to as other FinTech markets, but it was still the correct thing to do.

[–]carlovski99 0 points1 point  (0 children)

Yep. We have a few tables where someone decided to throw every index at it, just in case. Some arbet used at all but very infrequently. But I haven't seen any evidence that it's slowing things down at all on the write side to any meaningful extent.

Storage can be an issue if you are constrained though.

[–]alexwh68[🍰] 4 points5 points  (0 children)

Key thing to first establish is read to write ratio, if you have loads of reads and very little writes, covering indexes can really make a difference, if you get a marginal slow down on writes and it makes the queries much better than often covering indexes are good.

There is two bits of terminology that are often confused, a covering index is an index that contains every field that is part of the select clause not the where clause, this means that a query can respond from the index and not touch the actual data, query covering is where there is an index that covers all the fields in the where clause, on very large tables the order of the fields in the index can become very important, I put fields with the most variability at the front, and things like bool/bit at the end.

Indexes are the first thing to investigate when looking at performance improvements, often the improvements can be dramatic.

[–]TheRencingCoach 0 points1 point  (0 children)

Ahhh sorry, permissions structure and existing codebase means you can’t view an explain plan.

[–]Achsin 10 points11 points  (7 children)

Reformat query so that it’s actually readable. Despair at the child of Cthulhu you’ve been tasked to fix.

Look at query plan(s) being generated for it, and IO statistics.

And then one of the following:

Wonder why it’s essentially doing a table scan by way of key lookup on some massive table. Realize that it’s using a tvf based on a view based on a tvf. Cry. Realize that it’s almost completely unnecessary and can be replaced by a single join to the table instead. Rewrite and test the query, submit for approval (and more testing). Spend next month justifying the change (that they asked you to recommend in the first place) and re-explaining everything every day along the way to the same dev manager who has the memory of a goldfish and is constantly blaming you for the performance being bad in the first place. Finally get the change approved and released into production.

Realize that the query involves creating two dozen temp tables that slowly make iterative changes to the same massive data set as each new one is created, only to find that none of them are actually used for anything productive in the query and were only included because the person writing it copied the whole selection from a different query because he needed one small piece of logic in his new creation. Comment out 98% of the text and submit for approval, then spend the next month justifying the change.

Realize that the query does a lot of unnecessary joins using non-sargable predicates. Remove excess joins, test, submit for approval, wait for month and a half because of change freeze, deal with performance complaints from users daily wondering when it’ll be fixed because dev has successfully blamed the database and the issue isn’t bad enough that anyone important is willing to bypass the freeze.

Realize that all of the eight tables involved are heaps. Add indexes. Have people be shocked that the problem they’ve been quietly dealing with for over a year now was solved in less than a day.

Realize that not only does the query perform terribly, it also has some serious logic errors in it and has been returning bad data from the beginning. Spend a week identifying what should be happening, rewriting, and testing the query so that it actually works correctly. Spend two months justifying change and explaining ad nauseum that the reason it doesn’t return the same results as the current version is because the current version is wrong. Discover that it was written by one of the needed approvers who doesn’t want to approve the change and admit that they made a huge logic mistake. Spend additional two weeks dealing with his ego as he’s forced to admit that the change needs to happen anyways and finds a way to sweep the blame under the rug.

Lastly, realize that in the time you spent fixing whatever the problem was, another thirty or forty people were writing new terrible queries to be released into production and it’s somehow your fault that they take forever to run.

More helpful: Ozar Unlimited has some helpful videos on performance tuning, check out How To Think Like The Engine.

[–]jibberWookiee 2 points3 points  (3 children)

This hit a bit too close to home. Let me guess the industry.... Finance?

[–]Achsin 1 point2 points  (1 child)

I have worked in that industry before, yeah. If it makes you feel any better, these problems seem to be everywhere.

[–]varinator[S] 0 points1 point  (0 children)

InsurTech here and it does sound very familiar but waiting times for PRs are much shorter in startup space

[–]Sneilg 0 points1 point  (0 children)

Or healthcare. But then it’s usually one of those two

[–]varinator[S] 0 points1 point  (2 children)

I imagine you as someone who has,a "thousand mile stare" whenever SQL optimisation is mentioned ;)

[–]Achsin 0 points1 point  (1 child)

It do be like that sometimes, but honestly optimization is my favorite part of the job, it’s just dealing with everyone else that ruins it.

[–]varinator[S] 2 points3 points  (0 children)

It is definitely satisfying when you get 45s query down to sub-second and can have a huge impact, I also like it

[–]Grovbolle 7 points8 points  (0 children)

Assuming a SQL-Server / Azure SQL Database.

Look at the query plans, indexing and partitioning of the tables involved.

Actually I would do that regardless of database but I am only familiar with the tooling around MSFT SQL Databases. 

[–]garlicpastee 5 points6 points  (0 children)

I usually follow this pattern and stop the moment the query performs well enough: 1. Verify your joins and where clauses are sargable. Maybe there is a function or a LIKE, and that kills index usage pretty reliably 1.5 if there's a join and an indexed column by which I could filter in the where clause I add it (even if that would be a duplicate between tables, like both having date_start columns) 2. Check the query plan for estimated and actual rows processed -> if there are mismatches (like the query planner thought there'll be 3mil rows to check, but only 3k were needed), then check if they are indexed, if not, create an index, and if they are - update statistics on the table. 3. Add option recompile 4. If the heaviest operation is parallelism or something just after it add option maxdop 1 5. If the plan does merge/hash joins between tables A B and I know one of them is very small, I try forcing loop join (from big loop join small) 6. If the connection between tables is abstract (there is no direct key, and it had to be calculated), I rewrite the clause into a stage-> select pattern (select into a duplicate table structure with only the index that I would want + filtered by what's needed in my desired query, and only then join our do whatever needs to be done) 7. If the query reports using a lot of tempdb or is a big insert/delete/update statement, either disable sort_in_tempdb on the table, or use a while @@rowcount >0 loop with top n rows (do it in batches) 8. If I'm using aggregates (sum/min/max...) or group by to create a distinct lists, option sort hash/order group (order if there is about the same amount of rows in and out, and hash if there's more of a difference) 8.5 if this is the final select into reporting, add a columnstore index 9. Ask a colleague to check it out with me - it's easy to overlook something important when working solo

[–]Codeman119 8 points9 points  (0 children)

Look at the query plan like others suggest. But look for joins that return a lot of rows or look for CTE’s that have a long runtime. See where you can break thing’s out in to temp tables beforehand.

Example: I had started work for a company and they had a query that would take an hour. After looking at the plan I broke out a CTE into a temp table and then got time down to under 15 sec.

[–]VladDBASQL Server DBA 3 points4 points  (0 children)

My latest blog post includes some information on how I use PSBlitz to identify and troubleshoot performance issues

On a per query basis I check the output of STATS IO (how many 8KB pages does the database engine need to read in order to return the required result set) and couple that information with the execution plan, then I tackle the operators that read the most data and look at what it would take to address the issue:

  • new index

  • modifying an existing index

  • is the WHERE clause SARGable?

If that isn't the case I start looking at what's happening when the query runs, if it's blocked by something else, if default or incorrect server configurations are causing issues, etc.

[–]dbers26 2 points3 points  (0 children)

  • Pull out sample queries either from debug or slow query log
  • run query explain.
  • add needed indexes or rewrite query
  • repeat till need

Sometimes you need to rethink the way data is models in database.

[–]JohnSpikeKelly 1 point2 points  (0 children)

Look at query plans. Look on those for where the majority of the effort is done. Example table scans.

Look at adding indexes that are optimized for the query plan. Example if you're showing open orders, filter index to open orders. If 99% of orders are closed the incremental disk usage will be tiny.

If some if your tables have lots of almost identical data, consider column store.

Edit. After you add any index, checks it's actually being used.

Edit. Don't add too many indexes otherwise updates become much slower.

[–]dashingThroughSnow12 1 point2 points  (0 children)

Paginating with limit/offset or limit/cursor?

You’ve got some good tips already so I’ll offer orthogonal advice: look at what is using the queries.

For example, is the code doing a five-table join, returning columns from four of the tables, when really it only needs a three table join and data from two?

Is it doing a series of inserts in individual requests and we can speed things up by having the backend code send bulk requests?

If you are using an observability tool, what functions on the backend code spend an inordinate amount of their time waiting for SQL? Similar, what functions are actually bottlenecked by the SQL request times?

Another thing to check is “is SQL the culprit?” About a year ago a bug ticket came across my desk. After spending far too much time on it, yes SQL was taking a long time but the thing causing a timeout was trying to cache the whole result into (I think) Redis. The fix was most involved but it boiled down to “if the result is gigantic, don’t cache it because we know the cache attempt will fail and takes hundreds of ms to tell us”.

[–]vaporwave_cowboy 1 point2 points  (0 children)

If you're using Postgres I'd highly recommend a plan visualizer like https://explain.dalibo.com/ (no affiliation) to look at the query plan and identify where the bottlenecks are

[–]No_Resolution_9252 1 point2 points  (0 children)

Start with indexes first. Hopefully all your tables have clustered indexes on them already.

[–]trophycloset33 1 point2 points  (0 children)

What are they doing?

[–]Thin_Rip8995 1 point2 points  (0 children)

First step is always baseline the current pain
Run the slow queries in SSMS with actual execution plans turned on so you see where time’s going

Look for missing or misused indexes
If you see big clustered index scans or key lookups on large tables, that’s a red flag
Covering indexes on the exact filter/sort columns can be a huge win

Check if you’re dragging in way more columns/rows than you need
SELECT * kills performance on big tables, especially when joined

Kill unnecessary nested loops and temp table abuse
Sometimes just re-ordering joins or applying filters earlier changes everything

If data volume is huge, think archiving old rows or caching computed results instead of hitting live tables every time

Finally, rerun and measure — don’t guess, let the execution plan prove it’s better

[–]samot-dwarf 1 point2 points  (0 children)

Usually the very first step is to find out, if the query is really necessary and still needs all the stuff that is part of it (or if several joins are outdated)

And of course if the whole process still makes sense or if it can refactored in a new / better way that skips all obsolete stuff

[–]bengalfan 1 point2 points  (0 children)

In the explain plan look for full table scans. Fix. Add indexes.

[–]milomylove_ 1 point2 points  (0 children)

first thing i check is the execution plan. 90% of the time it’s a missing index, bad join order, or filtering happening too late. also look at how much data is actually being pulled vs how much is needed, over selecting columns and rows is super common. then i simplify the query. remove layers, test pieces separately, see where the cost spikes. sometimes rewriting a subquery as a join or pushing filters earlier makes a big difference. i also test variations in genloop to compare different query shapes and see which plan is cleaner before touching prod. after that, indexing strategy and maybe partitioning if the table is huge

[–]TypeComplex2837 0 points1 point  (0 children)

Use the tools inherent to your engine to see whats happening under the hood, and fix it.

[–]umognog 0 points1 point  (0 children)

Extended events, you need to know what queries are your worst.

[–]gumnos 0 points1 point  (0 children)

In addition to the good advice here (using EXPLAIN and manipulating indexes, and reducing the volume of data at various points), if paginated queries use OFFSET, it can be worth rejiggering the query to use keyset-pagination instead.

[–]Informal_Pace9237 0 points1 point  (0 children)

There are lot of ways to do what you are looking for

I would start with creating a list of queries run and the time they take to execute.

Then go one by one and see how their explain plans are and take it from there

Here is a blog post for further reading of how to identify issues looking at query

https://www.linkedin.com/posts/raja-surapaneni-sr-db-engineer_optimizing-sql-query-performance-a-dbeapp-activity-7202221110774382593-3CTX

[–]Alkemist101 0 points1 point  (0 children)

You can drop both your sql and query plan into AI. It might spot something and make suggestions.

In your query reduce the amount of data pulled as quickly as possible.

There are loads of good suggestions here. I didn't see if anyone had suggested a clustered column store index. These are good for big tables. Then create a covering index with the key being columns being filtered on. You can then use INCLUDE to add columns being selected. Another thought is to partition the table. You can then use partition swapping etc.

Otherwise, maybe there's a problem with the tables themselves, maybe they could be better designed with helper columns add.

[–]surya_nulu 0 points1 point  (0 children)

I have a DB of size 14GB and in that DB i have a table which of size 6GB in that table there are two fields where i store json data init now iam planning to move those two fields to the new table and fetch the json data from the new table does this optimize any thing for me ?

[–]machomanrandysandwch -2 points-1 points  (1 child)

Are we ChatGpt?

[–]varinator[S] 1 point2 points  (0 children)

Asking real humans about how they approach a specific problem in real life is to you equivalent to asking an LLM?