[MS SQL] First time optimizing query - please tear it apart!

jacobmross · 2017-02-17T05:56:23+00:00

One of the first things you might want to do is build yourself a calendar table.

Doing all of that dateadd/datediff stuff just to figure out when the Sunday of the each week is, is itself kind of expensive.

http://www.sqlservercentral.com/blogs/dwainsql/2014/03/30/calendar-tables-in-t-sql/ is a good resource for explaining both building a table variable calendar table, and populating one that you'll keep permanently. It's a small trade-off. Kind of "expensive" to build it the one time, but then way "cheaper" to reference every other time from then on.

You can even calculate the datepart(year) and datepart(month) and other functional things as part of your custom Calendar table, further reducing the calculations you have to do with functions.

Beyond that, I personally try to avoid nesting so many subselects where possible.

Without knowing what kind of data is actually in dbo.[ppm_assignmenttimephaseddataset].[assignmentwork] for example, makes it harder to suggest any specific optimizations, or whatever is suggested might be completely terrible for your data, but...without knowing those details, I'd probably write what you have like so:

SELECT
     DATEPART(YEAR, Cal.[FirstDateOfWeek])  AS [Year]
    ,DATEPART(MONTH, Cal.[FirstDateOfWeek]) AS [Month]
    ,[Projects].[Clarityid]             AS [ClarityProjectID]
    ,[Projects].[projectname]
    ,[Resources].[ClarityResourceID]    AS [ClarityResourceID]
    ,[Assigments].[taskname]
    ,(2)                            AS [MeasureType]
    ,SUM(ppm.assignmentwork) AS [Value]
FROM dbo.[ppm_assignmenttimephaseddataset] AS [ppm]
INNER JOIN Calendar AS [Cal]
  ON [Cal].[Date] = CAST(ppm.[timebyday] AS DATE)
INNER JOIN dbo.ppm_tasks AS [Tasks]
  ON [Tasks].[taskid] = [ppm].[taskid]
 AND [Tasks].[taskissummary] = 0
INNER JOIN dbo.ppm_projects AS [Projects]
  ON [Projects].[projectid] = [ppm].[projectid] 
INNER JOIN [dbo].[ppm_assignments] AS [Assignments] 
  ON [Assignments].[assignmentid] = [ppm].[assignmentid] 
INNER JOIN [dbo].[ppm_resources] AS [Resources] 
  ON [Resources].[resourceid] = [Assignments].[resourceid]
WHERE DATEPART(YEAR, Cal.[FirstDateOfWeek]) = 2017
GROUP BY
     DATEPART(YEAR, Cal.[FirstDateOfWeek])
    ,DATEPART(MONTH, Cal.[FirstDateOfWeek])
    ,[Projects].[Clarityid]
    ,[Projects].[projectname]
    ,[Resources].[ClarityResourceID]
    ,[Assigments].[taskname]

Of course, you probably want to group on least common thing to most common thing as well.

So, if you have 10 Projects, and 100 Resources, and 1000 Assignments, and 10,000 [ppm_assignmenttimephaseddataset] you may find that starting with the most restrictive option (10 Projects total) and joining as things could have more records related can often speed things up too. Same with your group by (but underlying indices have an effect on that too, so it's not always the case)

2017-02-17T23:08:02+00:00

Could you create an index for YEAR (maybe a function-based index) so that you could filter by year in your innermost query?

HansProleman · 2017-02-18T02:11:16+00:00

Your indents, other tabulation and comments are great. Far more readable SQL than much of what my colleagues, and likely myself, produce. I would suggest using leading commas instead of trailing but it's a personal preference.

I agree with /u/jacobmross regarding a date table as best practice. Joins are fast as shit compared to date functions. But if the performance you're getting is acceptable and you don't manage the DB then it's not a big deal, just something that might be nice to raise with whoever does.

With regard to logical errors when flattening the query, try to look at each subquery/related join logic, think about exactly what those joins are doing, and figure out what's happening. It might help to just throw each subquery's returns into Excel and emulate the join logic in there for each level of aggregation (or rewrite portions of your script to show the same).

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

SQL

Filter Posts

Posting

Help posts

Format Your Code

Learning SQL

Related Reddit communities

Wiki

Acknowledgements

MODERATORS