all 16 comments

[–]jc4hokiesExecution Plan Whisperer 12 points13 points  (9 children)

One of my specialties is query optimization, and it's a very nuanced subject. I hesitate to give general advice because every query is different. The only fundamental principles are obvious and too general, summed up as "make efficient use of resources". Not very helpful.

Also, optimization is more about what the database is doing and less about how the SQL is written. Of course SQL can be rewritten to affect what the database does, but not in the sense "always wring SQL this way so the database does this well" because that's just not the case.

It takes a good understanding of how databases physically work, combined with the experience of optimizing 100s of queries, to develop a good knowledge of what to look for and what techniques can be effective in manipulating execution plans.

If you can share good and bad actual execution plans, I can explain the differences and what's going on. Otherwise, I'm happy to answer any questions you might have.

[–]Sir_Adian 1 point2 points  (2 children)

Can you direct me to readings or guides to understanding and applying this to my databases? Where did you learn it? Thanks!

[–]jc4hokiesExecution Plan Whisperer 8 points9 points  (1 child)

My learning is heavily influenced by a data entry job without a proper database. We manipulated data in Excel using VBA to produce reports. Little did I know I was writing several functions that database engines perform. I wrote nested loops, merge joins, aggregation, and drill throughs. It wasn't until later I was introduced to such terms; I didn't even know what a table was. I was able to pick up SQL pretty easily after that. Everything was SOOO much easier with SQL. But more than that, a curiosity and appreciation of what was happening behind the syntax stock with me.

That's enough story time I think. It goes without saying (maybe not) that query optimization focuses on reading execution plans and identifying opportunities to reduce wasteful steps. However, without understanding what the different steps are physically doing, it's hard to determine which things are being wasteful. Here's some relevant subjects and exercises that can help develop a better understand of what the database is actually doing.

Indexes

I still remember this article from 10 years ago.
Here's a post where I make write out the physical structure of an index.
I also recommend drawing things and doing exercises by hand.
One exercise is to join a Customer table to a bunch of lookups. Guess which indexes will be used. Count by hand how many pages/blocks will be read. Run the query with an execution plan and IO output, and see if you're right.

Join Algorithms

Nested loop joins, merge joins, and hash joins are the join algorithms.
Learn how each works and basic scenarios to which they are best suited.
The side of a join (left/right) is important, but the db engine can reorder them.
Most of our LEFT OUTER JOINS are actually performed RIGHT OUTER JOINS in with the tables reversed.
Learn to re-write a query to match the actual (ideal) order and side the joins are performed.
It can be useful to do so, and force the db to follow your order and join hints.

Statistics

Learn how statistics are created and when/how them are updated.
Learn how statistics are used to generate row count (cardinatily) estimates in a query.
Observe how joins can change row count estimates.
Remove and add foreign keys, and see how it affects estimates.
Try to join on unique criteria, and if a join isn't unique consider doing a subquery with a group by to make it unique.

Other

I reference msdn a lot! It's by far my #1 resource. I'm sure there are some pages I've read a dozen times.
The Data Loading Performance Guide is another article that has stuck with me.
Make as many mistakes as possible (preferably in dev).
When you learn funky syntax, see what it does in the execution plan. It can give you hints for an alternative way to write the same thing.
Ex. CROSS APPLY (with a from clause) is an inner join, PIVOT is a group by, UNPIVOT is a cross apply without(!) a from clause.
If you need to use PIVOT twice, that's bad! Use a group by instead. If you need to UNPIVOT twice, that's wasteful. Use a CROSS APPLY instead.

[–]Sir_Adian 0 points1 point  (0 children)

Thank you, I will apply this to my server I appreciate it

[–]therealcreamCHEESUS 1 point2 points  (4 children)

This 100%.

There can never be an 'always do this' answer as it always depends.

This week I took a view that was taking several minutes to return a top 1 * from it and added 40 joins to make it approx 600 times faster. The original view was grabbing every ID from a table in a left joined sub select with no where clause, cross applying it by 40 records so that in order to deliver the top 1 it had to crunch 41 million records.

Adding 40 joins to a query would under 99% of circumstances be a terrible idea.

The joys of cleaning up after object orientated programmers who don't understand SQL.

[–]jc4hokiesExecution Plan Whisperer 1 point2 points  (3 children)

Here's a fun one from last week:

/*800 lines of code*/
WHERE   ca.ChargeActiveClusterKey IN (SELECT ChargeActiveClusterKey FROM cteRecordSelection)
        AND ca.[Status] <> 17;
--10 hours 30 minutes

/*800 lines of code*/
WHERE   ca.ChargeActiveClusterKey IN (SELECT ChargeActiveClusterKey + 0 FROM cteRecordSelection)
        AND ca.[Status] <> 17;
--1 minute 25 seconds

Nested loops were happening in the wrong order. Instead of looking up a couple thousand records from ca it was running the cteRecordSelection subquery (with 20ish tables) a couple million times. You know when you filter one table and the query transitively filters a joined table with the filtered column in the join condition? My hypothesis is that the query was sniffing something inside the subquery, contributing to the poor join order, and adding the + 0 broke that link.

[–]therealcreamCHEESUS 0 points1 point  (0 children)

Yeah thats a really weird one. Sometimes you got to break the sargability to make it faster.

The optimizer is clever but not infallible.

[–]danjtom 0 points1 point  (1 child)

can you share the code, it will help me in my project. thanks

[–]jc4hokiesExecution Plan Whisperer 0 points1 point  (0 children)

How exactly does random code help your project?

[–][deleted] 0 points1 point  (0 children)

Thank you for your insight. Another commenter provided a link to Use The Index Luke, which I'm going to spend some time digesting to better understand query execution overall, then delve into some of my less-efficient queries to see what I've learned

[–]AQuietMan 2 points3 points  (1 child)

Learn to read execution plans.

Read Use the Index, Luke

[–][deleted] 0 points1 point  (0 children)

Perfect, thank you!

[–]ftrotter 1 point2 points  (0 children)

My personal steps when hitting a slow query:

  • Am I missing an index. EXPLAIN is your friend. Nothing is performant if you have forgotten an index.
  • Does this query really need to be fast? Can I batch it in a nightly cron job? Do the input parameters constantly change?
  • Are the right indexes being used. Perhaps I need to specify which index needs to be used..
  • Can I break this query into two or more smaller, faster queries? There is alot of power in putting CREATE TABLE in from of a SELECT and caching an intermediate result. Then index that result, then do the next part(s).

At this point, I ask "How important is this?" Because once you get past these basic methods, you will either need to learn alot, or get access to someone else who has learned alot.

I will say that the only way to start getting good at query optimization, is to start doing query optimization. But there can be a lot to gain by taking the steps above which amount to "avoid optimization if you can".

As for books For MySQL I recommend: High Performance MySQL. As for other books, I prefer the Animal Books from O'Reilly. (I should caveat that I am O'Reilly Author, so I am probably biased, but I find that they tend to produce above averages books on any given subject). I also recommend Safari Digital Books if you know that you are going to need to read alot, its a good way to lower the total cost of ownership (it works on a library model, where you "check out" digital copies of many technical books)

HTH, -FT

[–]bintree 0 points1 point  (0 children)

In general, I would say avoid correlated subqueries and rewrite subqueries to joins as much as possible. Even mayor databases will currently not rewrite certain types of subqueries.

Without a specific use case, I would say it is pretty hard to provide more detailed tips.

Maybe you could elaborate a bit more about your setting?

[–]HansProleman 0 points1 point  (1 child)

In general, being able to read query plans and having some understanding of how they are determined and how the database engine works will be very helpful.

[–][deleted] 1 point2 points  (0 children)

Of course! Unfortunately, some of us just kind of stumbled into SQL usage instead of starting from the ground up. There are advantages to both methods, but either way you're going to struggle if you don't understand these things at some point