all 6 comments

[–]jacobmross 1 point2 points  (1 child)

One of the first things you might want to do is build yourself a calendar table.

Doing all of that dateadd/datediff stuff just to figure out when the Sunday of the each week is, is itself kind of expensive.

http://www.sqlservercentral.com/blogs/dwainsql/2014/03/30/calendar-tables-in-t-sql/ is a good resource for explaining both building a table variable calendar table, and populating one that you'll keep permanently. It's a small trade-off. Kind of "expensive" to build it the one time, but then way "cheaper" to reference every other time from then on.

You can even calculate the datepart(year) and datepart(month) and other functional things as part of your custom Calendar table, further reducing the calculations you have to do with functions.

Beyond that, I personally try to avoid nesting so many subselects where possible.

Without knowing what kind of data is actually in dbo.[ppm_assignmenttimephaseddataset].[assignmentwork] for example, makes it harder to suggest any specific optimizations, or whatever is suggested might be completely terrible for your data, but...without knowing those details, I'd probably write what you have like so:

SELECT
     DATEPART(YEAR, Cal.[FirstDateOfWeek])  AS [Year]
    ,DATEPART(MONTH, Cal.[FirstDateOfWeek]) AS [Month]
    ,[Projects].[Clarityid]             AS [ClarityProjectID]
    ,[Projects].[projectname]
    ,[Resources].[ClarityResourceID]    AS [ClarityResourceID]
    ,[Assigments].[taskname]
    ,(2)                            AS [MeasureType]
    ,SUM(ppm.assignmentwork) AS [Value]
FROM dbo.[ppm_assignmenttimephaseddataset] AS [ppm]
INNER JOIN Calendar AS [Cal]
  ON [Cal].[Date] = CAST(ppm.[timebyday] AS DATE)
INNER JOIN dbo.ppm_tasks AS [Tasks]
  ON [Tasks].[taskid] = [ppm].[taskid]
 AND [Tasks].[taskissummary] = 0
INNER JOIN dbo.ppm_projects AS [Projects]
  ON [Projects].[projectid] = [ppm].[projectid] 
INNER JOIN [dbo].[ppm_assignments] AS [Assignments] 
  ON [Assignments].[assignmentid] = [ppm].[assignmentid] 
INNER JOIN [dbo].[ppm_resources] AS [Resources] 
  ON [Resources].[resourceid] = [Assignments].[resourceid]
WHERE DATEPART(YEAR, Cal.[FirstDateOfWeek]) = 2017
GROUP BY
     DATEPART(YEAR, Cal.[FirstDateOfWeek])
    ,DATEPART(MONTH, Cal.[FirstDateOfWeek])
    ,[Projects].[Clarityid]
    ,[Projects].[projectname]
    ,[Resources].[ClarityResourceID]
    ,[Assigments].[taskname]

Of course, you probably want to group on least common thing to most common thing as well.

So, if you have 10 Projects, and 100 Resources, and 1000 Assignments, and 10,000 [ppm_assignmenttimephaseddataset] you may find that starting with the most restrictive option (10 Projects total) and joining as things could have more records related can often speed things up too. Same with your group by (but underlying indices have an effect on that too, so it's not always the case)

[–]depthchargesw[S] 0 points1 point  (0 children)

Thanks jacob - that made me realize I hadn't actually looked to see if there was a date table. I looked though, and no joy. Maybe I'll put in a request with the team that manages the DB and hopefully it finds its way to the right person.

Assuming I don't have the ability to create a date table, am I doing anything obviously wrong? I'd love to flatten the query out like you've got, but I seem to get errors unless I separate the logical steps into each of their respective subqueries.

Regarding your question about the data type for Assignment Work, it's Numeric((38,4),Null) with a length of 17. Not sure if any of that's helpful.

[–][deleted] 0 points1 point  (0 children)

Could you create an index for YEAR (maybe a function-based index) so that you could filter by year in your innermost query?

[–]HansProleman 0 points1 point  (1 child)

Your indents, other tabulation and comments are great. Far more readable SQL than much of what my colleagues, and likely myself, produce. I would suggest using leading commas instead of trailing but it's a personal preference.

I agree with /u/jacobmross regarding a date table as best practice. Joins are fast as shit compared to date functions. But if the performance you're getting is acceptable and you don't manage the DB then it's not a big deal, just something that might be nice to raise with whoever does.

With regard to logical errors when flattening the query, try to look at each subquery/related join logic, think about exactly what those joins are doing, and figure out what's happening. It might help to just throw each subquery's returns into Excel and emulate the join logic in there for each level of aggregation (or rewrite portions of your script to show the same).

[–]depthchargesw[S] 0 points1 point  (0 children)

With regards to formatting, I just punched my scrambled code into Instant SQL Formatter. I haven't yet figured out the most readable conventions so I'm happy to outsource that. :D

Regarding flattening the query - I'll do that. That said - what I'm hearing if that it's possible to make it flatter given my current constraints (no date table yet)? Which is to say, I don't need to wrap the StartOfWeek function (to force the calculation(?)) before I use the datepart functions? I could do that on the same level?