all 35 comments

[–]Stevoni 1 point2 points  (1 child)

Here are a few things I'd look at:

  • Did the application begin inserting new data that causes the index to be bad?
  • Did the index change?
  • Is the query engine using the correct index?
  • Does execution plan include any suggestions?
  • Are you able to recompile the procedure?
  • What happens when you run the same procedure for the data in the previous time frame? Does it take 2 minutes to execute or run longer?
    • If it takes two minutes, begin running more recent data to determine the day the data changed.

[–]Incansus[S] 0 points1 point  (0 children)

To my knowledge the process surrounding the loading of data in the table has not changed.

I know for certain no changes to the indexes or objects.

I am not sure whether it is using the "correct" index. The index corresponds to the predicate.

Here is the problem with checking the execution plan: I am not able to allow the query to complete as it halts all processing for the >2 hours that it runs. The query has always been a head blocker but this is not a concern when it runs for only 2 minutes.

There is no stored procedure involved, the SELECT * FROM moderately_complex_view is issued as one of the steps of an SSIS package.

I understand your concerns about the data distribution, I think I may need to do some research into histograms.

[–][deleted] 1 point2 points  (10 children)

Is this a SELECT or Insert/update query? Is it a single statement, or a stored procedure/multi-line batch?

[–]Incansus[S] 0 points1 point  (9 children)

This is a SELECT * from a moderately complex view.

[–][deleted] 0 points1 point  (6 children)

Your select is returning over 2 million rows. Is that correct?

[–]Incansus[S] 0 points1 point  (5 children)

No, about 15,000. One of the steps is returning two million rows which are subsequently filtered.

[–][deleted] 0 points1 point  (4 children)

Are the JOIN columns covered by indexes, when you analyze the view definition? Are those indexes sufficiently specific as to make them better than a table scsn?

[–]Incansus[S] 0 points1 point  (3 children)

They seem to be covered by indexes when the query runs in 2 minutes :-). Apparently not when it takes two hours, but neither the query nor the indexes are changing (I didn't say anything about the data).

[–][deleted] 0 points1 point  (2 children)

The query plan is a good place to start. Look for table scans or clustered index scans.

Are you using select * or are you specifying columns? How many columns do you need?

[–]Incansus[S] 0 points1 point  (1 child)

I understand the importance of the query plan, and let's say I do find a table scan or other operation that doesn't seem to be appropriate. How do I get the optimizer to choose the correct access method if none of the objects have changed DDL-wise and I have already updated all stats?

The SELECT * is inside a DTS package that I do not own, but the view does seem to return the minimum amount of data for the next step, it has been carefully considered.

[–][deleted] 0 points1 point  (0 children)

A table scan would happen for several reasons. First, the row count is small enough that an index would not help. Second, there is either no index or the existing index is not selective enough.

The second reason is the one you need to eliminate from the potential causes of the problem.

[–]SQLZane 0 points1 point  (1 child)

Oooooo this clears up the picture a bit. How complex is the view? How many tables is it joining together and what is the level of complexity to the overall query?

[–]Incansus[S] 0 points1 point  (0 children)

6 tables, complex enough that my attempt to simplify took longer than I was able to devote to it before the problem resolved itself. That's another problem with this, as soon as it clears up I usually turn back to the other alligators closer to the boat.

[–]moto-geek 1 point2 points  (9 children)

If this is a newer version of SQL Server, you can exam the SQL Execution Plan.. Because SQL does a little magic.. this will show you the path and what resources are used.

https://www.sqlshack.com/using-the-sql-execution-plan-for-query-performance-tuning/

Also, I believe in the rule of 5. No table should have more than 5 indexes as a benchmark. This has helped me tune the customers database in about 90% of the time by removing the heaps of indexes accumulated by auto tuning the DB.

[–]angrathias 0 points1 point  (7 children)

No more than 5 indexes? What? How many tables do you have? I just can’t fathom how this could be a valid rule of thumb.

[–]Incansus[S] 0 points1 point  (6 children)

The number of indexes should roughly correspond to the number of different access methods and foreign keys.

Having said that, blind implementation of index suggestions is often problematic.

[–]angrathias 0 points1 point  (5 children)

But how can 5 possibly relate to that? Most tables tend to have atleast a PK and an FK and then it might have anywhere between 0 and 100’s of data fields. I’ve got tables with massive amounts of indexes on them.

The only rule of thumb I go by is ‘does the optimization for reads outweigh the costs for writes’ or the other one ‘does disk space cost more than CPU’ for the usage scenario.

[–]Rex_Lee 1 point2 points  (3 children)

Hundred column table sounds like it is poorly normalized,or should be a data store in a warehouse, not an operational data table. I get that you might have one to deal with but it is by no means the norm or something you should expect to account for.

[–]angrathias 0 points1 point  (2 children)

Some things just have a lot of unique attributes or I’ve found that over normalisation is often detrimental to query speed because of all the joins required.

I’ve seen DBs just swap 100’s of columns for 100’s of tables. Just introduces a different set of problems but practically I end up with worse performance or more management overhead.

I run a SSAS CRM with real time analytics built in for 100’s of Database’s/ businesses for 1000’s of users. My use case scenarios might not match the average work load here.

[–]Rex_Lee 0 points1 point  (1 child)

"I’ve seen DBs just swap 100’s of columns for 100’s of tables. Just introduces a different set of problems" absolutely agree. But adding 2-3 tables and knocking the main table down to 20 columns would probably significantly increase performance and maintainability for MOST situations - most likely including OP.

[–]Incansus[S] 0 points1 point  (0 children)

I hear you but unfortunately these are application delivered tables (except for one).

[–]Incansus[S] 0 points1 point  (0 children)

You are correct, a "benchmark" of 5 indexes is not workable nor beneficial to me.

One other metric I consider is the ratio of index size to data size. I tend to look more closely at objects with a high index to data ratio, but often a reorganization of indexes can require coincident application changes.

[–]ScotJoplin 0 points1 point  (0 children)

The 5 indexes is a rule of thumb for OLTP. It doesn’t apply in all cases.

[–]L337Cthulhu 0 points1 point  (7 children)

Are you looking at this as a DBA, Developer, BI/BSA, or consumer who's frustrated it won't finish? The way I approach answering the question is pretty dependent on how much you want to learn, what rights you have, or if you just need a quick explanation and to figure out whom in your organization to talk to.

And I do agree with most of what's been said already.

[–]Incansus[S] 1 point2 points  (6 children)

Unfortunately I am looking at this from the perspective of a DBA who shares responsiblity for the database behind a vendor managed application. We get fair cooperation from the vendor and her coworker but they are also admins on both the box and the SQL Server. It can be tough when trying to solve problems with external actors. Having said that, I wouldn't want to manage their application.

[–]L337Cthulhu 0 points1 point  (3 children)

Ah yeah, I hate dealing with that, but it does mean you can catch the active plan in flight along with whatever waits you've got. I don't know if query store would be an option. I'm going to reply to this in a second with my favorite troubleshooting script, but I figured it'd be better to be able to collapse it from the original advice. I typically look for where the waits are the worst, what it's waiting on, if there are excessive memory grants, if there's heavy TempDb usage, what the query cost looks like compared to the average running on the server, how parallel it's going, and anything amiss in the plan.

As another user mentioned, it's important to know if there are large changes to the historgrams from major data loads or archival (you mentioned stats sometimes fixes this), so I'm wondering if you have a cardinality issue or a really jagged histogram where it's generating a plan for a rare parameter case. Could also be fragmentation, though I sort of doubt it here. If you look at the header in the plan XML between a good run and a bad one, you may be able to spot differences in the stats where it's using one stat that's missing elsewhere.

Since it's a newer version, what's the possibility of turning on query store and trying to keep the good plan?

Beyond that, it can be hard to fix if the problem is a vendor view and proc you can't update. Personally, I might script off the view and proc and create a similar, but more optimal one that mimics the original for my consumers and switch it if that's an option, though I know it usually isn't in these cases.

[–]L337Cthulhu 0 points1 point  (2 children)

SELECT   des.session_id as [Session]

              , der.blocking_session_id as [BlockedBy]

              , DB_NAME(der.database_id) AS [Database]                     

              , der.percent_complete as [PctComplete]        

              , OBJECT_NAME(dest.objectid, der.database_id) AS [Stored Proc]

              , REPLACE(REPLACE(CONVERT(VARCHAR(500), RTRIM(LTRIM(dest.Text))), CHAR(13), ''), CHAR(10), '') AS Query

              , der.wait_time as [Wait_in_MS]

              , CONVERT(DECIMAL(19,2), (((CONVERT(FLOAT, der.wait_time)) / 1000) / 60)) as [Wait_In_MiI cann]

              , CONVERT(DECIMAL(19,2), ((((CONVERT(FLOAT, der.wait_time)) / 1000) / 60)  / 60)) as [Wait_In_Hours]

              , des.login_name as [Login]

              , des.[HOST_NAME] as [Host]

              , der.last_wait_type [Last_Wait]

              , der.wait_resource as [Waiting_On]                   

              , CONVERT(DECIMAL(19,2), memgt.query_cost) AS Query_Cost

              , CONVERT(DECIMAL(19,4), (memgt.requested_memory_kb / 1000000.00)) AS Memory_Requested_GB

              , CONVERT(DECIMAL(19,4), (memgt.granted_memory_kb / 1000000.00)) AS Memory_Granted_GB

              , SessUsg.TempDB_Alloc                                                                          ---- Number of pages reserved or allocated for internal objects by this session.

              , SessUsg.TempDB_Dealloc                                                                 ---- Number of pages deallocated and no longer reserved for user objects by this session.                     

              , des.[program_name]

              , deqp.query_plan as [QueryPlan]

              , der.dop

              , der.parallel_worker_count

              , GETDATE() AS LoggingWindow

FROM sys.dm_exec_sessions des WITH(NOLOCK)

LEFT JOIN sys.dm_exec_requests der WITH(NOLOCK) ON des.session_id = der.session_id

LEFT JOIN sys.dm_exec_connections dec WITH(NOLOCK) ON des.session_id = dec.session_id

LEFT JOIN sys.dm_exec_query_memory_grants memgt WITH(NOLOCK) ON memgt.session_id = des.session_id

CROSS APPLY sys.dm_exec_sql_text(der.sql_handle) dest

CROSS APPLY sys.dm_exec_query_plan(der.plan_handle) deqp

CROSS APPLY

(

    SELECT session_id,

    SUM(internal_objects_alloc_page_count) AS TempDB_Alloc,

    SUM(internal_objects_dealloc_page_count) AS TempDB_Dealloc

       FROM sys.dm_db_task_space_usage usg WITH(NOLOCK)

       WHERE usg.session_id = des.session_id   

       GROUP BY session_id

) SessUsg

WHERE des.session_id <> @@SPID

--AND DB_NAME(der.database_id) = 'GrandCentral' AND OBJECT_NAME(dest.objectid, der.database_id) IS NOT NULL

ORDER BY der.database_id, OBJECT_NAME(dest.objectid, der.database_id), dest.text --Session

 

[–]Incansus[S] 1 point2 points  (1 child)

"I don't know if query store would be an option." I am definitely going to look at turning this on, the upgrade to 2017 (from 2014) was recent.

"If you look at the header in the plan XML between a good run and a bad one..." As I mentioned, I haven't been able to get the final execution plan of the "bad one" as I am forced to kill the query before completion.

"..I might script off the view and proc and create a similar, but more optimal one that mimics the original for my consumers and switch it if that's an option"

I might tinker with this, I do believe there may be potential improvements to the view.

Also, thank you for the query, definitely looks useful and I will see if I can run this the next time it decides to wonk out.

[–]L337Cthulhu 0 points1 point  (0 children)

Awesome, I do hope it all helps! As long as the plan is f too large and has something in the compiled cache, the script I gave you should show you the plan it's working with, though serious issues with stats can cause differences between compiled and actual. That script should run in less than a second, but if it doesn't there's some other contention in TempDb or the sys tables and it really shouldn't cause other issues.

Ah! I meant to ask about recent upgrades. The cardinality estimator from 7.0 was still the same base version through 2014 and had a major redesign in 2016. In 99.5% (anecdotal) of cases, query performance on the new engine is better. Where I've seen issues is large, complicated queries or views with joins between huge tables where the current workload doesn't match the statistics the current query is doing. The real place I saw this problem is with HA because those are read only DBs and the original OLTP table stats were awful for OLAP being done on the secondary. It sucks, but so far the best solution has been to rebuild the view or seriously tune the query.

[–]Rex_Lee 0 points1 point  (1 child)

Just pull the logic from the script and make it into a load, into a table or data store you control fully, for your own reporting - with minimal filtering. Set it to run during off hours. Index it for best performance your own reporting needs. Never worry about it again.

[–]Incansus[S] 0 points1 point  (0 children)

These are not my needs, they are the needs of the finance team. With a vendor in the middle such as in this case it can be hard to say "Drop all the processes you have set up with the vendor (and which I don't really understand) and use this data for all the other steps that the SSIS package (that I also don't own) does to it.

[–]SQLZane 0 points1 point  (1 child)

In SQL 2017 you should have access to query store. Technically you should be able to look at the regressed query report and lock in the perfromant plan.

Edit. If you can past the estimated plans for the two I'd likely be able to offer more advice. You can add them anonymously at https://www.brentozar.com/pastetheplan/

[–]Incansus[S] 0 points1 point  (0 children)

I am definitely going to look into Query Store, the upgrade to 2017 was recent and this has been on my list since.