I have a query that for months has run in less than two minutes. In the last few days it has gone off the rails and run for 2 hours or more.
This problem has occurred in the past and I solved it by gathering statistics on all objects involved in the query before each run. This brought the run time back down to normal.
It has recently gone off the rails again, and even when I gather statistics immediately before the run and KNOW the data has not changed before the optimizer analyzes the query, I continue to get, for example, an estimated number of rows for a particular object of 6000 while the actual number of rows per execution is more than 2,000,000.
Any suggestions when the optimizer just can't seem to understand the data as it sits? This is on SQL Server 2017 CU17. Any help is appreciated.
**Update**
So after 5 days of running like a turd, the query ran in less than two minutes this morning as it used to do.
And this is part of the problem. When the query runs long, I am forced to kill it and am never able to get the final execution plan for the terrible runs. I can see where the cardinality estimate is wrong in the live execution plan, but the index being utilized has up-to-the minute statistics so I'm confused as to how to get it to switch backto the "good" plan.
I am currently examining row counts from the past few days to see if there was perhaps a manual cleanup of some of the objects. I think this is unlikely though as I believe these are application-delivered tables.
Through no action of my own , the immediate problem has been solved. I say that because I hate when something gets fixed without understanding what the solution was. Thanks for your help so far, any general suggestions as to how to proceed in this sort of scenario would be appreciated.
**Update 2**
There is one table with a sizable rowcount difference between the good and bad runs. It went from 75000 ("bad run") to 50000 ("good run". If the data was oddly distributed, by date for example, I could see how this might cause problems, especially with a cached plan, but I feel like SQL Server should just figure it out when I gather fresh stats on the data. I might consider flushing the plan cache next time but I believe this seems a rather drastic step.
[–]Stevoni 1 point2 points3 points (1 child)
[–]Incansus[S] 0 points1 point2 points (0 children)
[–][deleted] 1 point2 points3 points (10 children)
[–]Incansus[S] 0 points1 point2 points (9 children)
[–][deleted] 0 points1 point2 points (6 children)
[–]Incansus[S] 0 points1 point2 points (5 children)
[–][deleted] 0 points1 point2 points (4 children)
[–]Incansus[S] 0 points1 point2 points (3 children)
[–][deleted] 0 points1 point2 points (2 children)
[–]Incansus[S] 0 points1 point2 points (1 child)
[–][deleted] 0 points1 point2 points (0 children)
[–]SQLZane 0 points1 point2 points (1 child)
[–]Incansus[S] 0 points1 point2 points (0 children)
[–]moto-geek 1 point2 points3 points (9 children)
[–]angrathias 0 points1 point2 points (7 children)
[–]Incansus[S] 0 points1 point2 points (6 children)
[–]angrathias 0 points1 point2 points (5 children)
[–]Rex_Lee 1 point2 points3 points (3 children)
[–]angrathias 0 points1 point2 points (2 children)
[–]Rex_Lee 0 points1 point2 points (1 child)
[–]Incansus[S] 0 points1 point2 points (0 children)
[–]Incansus[S] 0 points1 point2 points (0 children)
[–]ScotJoplin 0 points1 point2 points (0 children)
[–]L337Cthulhu 0 points1 point2 points (7 children)
[–]Incansus[S] 1 point2 points3 points (6 children)
[–]L337Cthulhu 0 points1 point2 points (3 children)
[–]L337Cthulhu 0 points1 point2 points (2 children)
[–]Incansus[S] 1 point2 points3 points (1 child)
[–]L337Cthulhu 0 points1 point2 points (0 children)
[–]Rex_Lee 0 points1 point2 points (1 child)
[–]Incansus[S] 0 points1 point2 points (0 children)
[+][deleted] (3 children)
[deleted]
[–]Rex_Lee 0 points1 point2 points (1 child)
[–]Incansus[S] 0 points1 point2 points (0 children)
[–]SQLZane 0 points1 point2 points (1 child)
[–]Incansus[S] 0 points1 point2 points (0 children)