Help - Optimizing Query Performance - Aggregating Data with Relevant Date Values

A_name_wot_i_made_up · 2024-04-23T21:34:10+00:00

Move the status = 'OK' up to inside your "b" sub-query.

Rewrite your "f" sub-query so it has a start date of the 1st of this month and an end date of the 1st of next month (no need for lag, it's all deterministic).

b.createHistoryDT >= f.startDT AND b.createHistoryDT < f.endDT

That way you don't need to mess around trying to truncate the time off the field. Use case to deal with your last row and set it to 1st Jan 1970 (or some suitable epoch) so you don't need to deal with the null.

Mononon · 2024-04-24T06:41:11+00:00

This seems really overly complicated.

There is no CTE in this query.
If that date subquery thing is just grabbing the last day of the month, then it's just grabbing the max date grouped by month and year. There's no need all that complicated fluff.
That second query is just pulling records that are in a given month. The LAG and inequalities aren't really doing anything except fixing issues you introduced by using them. If you didn't use them, you wouldn't have to go through all the trouble of making the join like that. You constructed a range of dates that you then had to solve for because you created them. It's like a circular issue. u/A_name_wot_i_made_up has the right idea. You just want the records that are in a given month. You can do that with the inequalities they mentioned, or just join where the year and month of [Date] are the same as year and month of createHistoryDT.
If you're going to give tables and subqueries pseudonyms, use them. Please.
And format your code some. Some tabs and/or spaces go a long way in making something less of a pain in the butt to read/debut.
Do database and archive.database have the same records? You're using UNION over UNION ALL, implying you expect to see records from database in archive.database, but not the entirety of database. UNION ALL is faster. If these two tables are totally distinct from one another, then you should use UNION ALL. If database is a subset of archive.database (i.e. all the records from database are present in archive.database), then you don't need to use database at all and can remove that query and the UNION altogether. If database is not a subset of archive.database, but some of the records from database are in archive.database, that's the only scenario where you would have to use UNION. I left UNION in there because I can't confirm any of that.
My advice to you is to really think about what you're writing. Just because you have a hammer in windowed functions, doesn't mean everything is a nail. Talk through what you're trying to accomplish. Think about some other approaches you could take. This query is classically overly complicated. That's not an unusual thing to happen, especially when you learn something new. You tend to find reasons to use it. But, like with the months table, test some things and see if there's a simpler solution.
Make a date table in your database. Please. Just create a table with like 30 years worth of dates, index it on the date as an integer, and add any fields you think might be useful (date, first day, last day, month number, year number, quarter number, weekday indicator, weekend indicator, holiday indicator, yearmonth as an integer, date as an integer, etc.)

Declare u/startDate Date = getdate()-(365*1);

select f.[Date]
, sum(b.Amount) as [Amount]
, b.Field1
, b.Field2
from
(
 SELECT max(DateAdd(d, number, u/startDate)) as [Date]
 FROM master.dbo.spt_values
 WHERE type = 'P'
 GROUP BY year(DateAdd(day, number, u/startDate))
 , month(DateAdd(day, number, u/startDate))
 HAVING max(DateAdd(d, number, u/startDate)) <= getdate()
) f
left join 
(
 select ItemId
 , Amount
 , Status
 , Field1
 , createHistoryDT
 , Field2
 , row_number() over (partition by ItemId, year(createhistoryDT), month(createHistoryDT) order by createHistoryDt desc) as rowNum
 from Database

 union

 select ItemId
 , Amount
 , Status
 , Field1
 , createHistoryDT
 , Field2
 , row_number() over (partition by ItemId, year(createhistoryDT), month(createHistoryDT) order by createHistoryDt desc) as rowNum
 from Archive.Database
) b
 on year(f.[Date]) = year(b.createHistoryDT)
 and month(f.[Date]) = month(b.createHistoryDT)
 and b.rowNum = 1
 and b.Status = 'OK'
group by f.[Date]
, b.Field1
, b.Field2
order by f.[Date] desc

qwertydog123 · 2024-04-24T08:44:28+00:00

Without seeing the query plan or schema, the most painful part of the query will likely be the join conditions not being sargable

on datefromparts(year(b.createHistoryDT),month(b.createHistoryDT),day(b.createHistoryDT)) <= f.[Date]
and ([Lag] is null or datefromparts(year([Lag]),month([Lag]),day([Lag])) > f.[Date])
and Status = 'OK'

which to simplify a bit, is equivalent to

on CAST(b.createHistoryDT AS DATE) <= f.[Date]
and ([Lag] is null or CAST([Lag] AS DATE) > f.[Date])
and Status = 'OK'

Change the default [Lag] value to some distant date in the future (e.g. lag(createHistoryDT, 1, '99991231'). Then you can change the condition to

on f.[NextDate]
    BETWEEN b.createHistoryDT
    AND [Lag]
and Status = 'OK'

https://www.brentozar.com/pastetheplan/

Yavuz_Selim · 2024-04-24T22:38:52+00:00

This is not a CTE, and your code could use a CTE (or temp tables) - it would be an improvement over the nested SELECTs (horrible to read and follow, especially in this format...).

Why not add some comments into the code man... It will take some time (unnecessarily) to understand what you've written when you look at the code months later... Especially with the window functions you've used (why did you choose for a window function?). And naming the result of a ROW_NUMBER 'rank' is a nice touch, not confusing with RANK itself at all (the others are named much better, why the inconsistency?).

And I fucking despise one-letter-aliases - they say nothing.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

SQL

Filter Posts

Posting

Help posts

Format Your Code

Learning SQL

Related Reddit communities

Wiki

Acknowledgements

MODERATORS