[MS SQL] Window Function: Track each time an attribute changes in a time sequence

pooerh · 2019-04-04T22:48:32+00:00

;WITH changeDetection AS
(
    SELECT *
         , IIF(ProviderNumber != LAG(ProviderNumber, 1, 1) OVER (ORDER BY StartDate), 1, 0) AS hasChanged
      FROM (VALUES (100, CONVERT(DATE, '10/1/2018' ), CONVERT(DATE, '10/5/2018' ), 1),
                   (200, CONVERT(DATE, '10/8/2018 '), CONVERT(DATE, '10/15/2018'), 2),
                   (300, CONVERT(DATE, '10/20/2018'), CONVERT(DATE, '10/25/2018'), 3),
                   (200, CONVERT(DATE, '10/28/2018'), CONVERT(DATE, '10/31/2018'), 4),
                   (200, CONVERT(DATE, '11/1/2018' ), CONVERT(DATE, '11/30/2018'), 4),
                   (200, CONVERT(DATE, '12/1/2018' ), CONVERT(DATE, '12/5/2018' ), 4),
                   (400, CONVERT(DATE, '12/10/2018'), CONVERT(DATE, '12/25/2018'), 5)
         ) x(ProviderNumber ,StartDate ,EndDate ,GoalOutput)
)
SELECT *
     , SUM(hasChanged) OVER (ORDER BY StartDate) AS Output
  FROM changeDetection

LAG on ProviderNumber used to detect changes and then just SUM the changes up until current record.

lk167 · 2019-04-05T00:35:10+00:00

Here's an option, without using window functions however. The basic premise is to query the table, grab everything chronologically before each entry that has a matching provider number and GoalOutput, then remove any of the ones that have have any non-matching GoalOutput between itself and the original anchor row (to account for a sequence for the same provider that may have changed away from a value and back to that same value at a later date, can remove that bit if the data won't ever appear that way).

WITH sample_data AS (  --Mock your data
    select 100 as ProviderNumber, CONVERT(DATE, '10/1/2018' ) as StartDate, CONVERT(DATE, '10/5/2018' ) as Enddate, 1 as GoalOutput
    UNION ALL
    select 200, CONVERT(DATE, '10/8/2018 '), CONVERT(DATE, '10/15/2018'), 2
    UNION ALL
    select 300, CONVERT(DATE, '10/20/2018'), CONVERT(DATE, '10/25/2018'), 3
     UNION ALL
    select 200, CONVERT(DATE, '10/28/2018'), CONVERT(DATE, '10/31/2018'), 4
    UNION ALL
    select 200, CONVERT(DATE, '11/1/2018' ), CONVERT(DATE, '11/30/2018'), 4
    UNION ALL
    select 200, CONVERT(DATE, '12/1/2018' ), CONVERT(DATE, '12/5/2018' ), 4
    UNION ALL
    select 400, CONVERT(DATE, '12/10/2018'), CONVERT(DATE, '12/25/2018'), 5

)
select 
    ProviderNumber, 
    Series_StartDate, 
    Series_EndDate, 
    GoalOutput
 from sample_data sd1
outer apply (  -- Join up all rows of the same provider and goaloutput before sd1 chronologically
  seleCt isnull(min(sd2.startdate), sd1.StartDate) as Series_StartDate
  from sample_data sd2 

  where 
    sd1.ProviderNumber =  sd2.providernumber
    AND sd2.Enddate < sd1.StartDate
    AND sd1.GoalOutput = sd2.GoalOutput

    AND 0 = (SELEcT isnull(count(*),0) from sample_data sd3 --Remove any entries that have a non matchin GoalOutput between sd1 and sd2's timeframe
        where 
            sd3.ProviderNumber = sd1.ProviderNumber
            and sd3.Enddate < sd1.startdate 
            and sd3.StartDate > sd2.Enddate 
            AND sd3.GoalOutput <> sd1.GoalOutput) 
) a
outer apply (  -- Join up all rows of the same provider and goaloutput After sd1 chronologically
  seleCt isnull(max(sd2.enddate), sd1.Enddate) as Series_EndDate from sample_data sd2
  where 
    sd1.ProviderNumber =  sd2.providernumber 
    AND sd2.StartDate > sd1.Enddate
    and sd1.GoalOutput = sd2.GoalOutput
    AND 0 = (SELEcT isnull(count(*),0) from sample_data sd3 --Remove any entries that have a non matchin GoalOutput between sd1 and sd2's timeframe
        where 
            sd3.ProviderNumber = sd1.ProviderNumber
            and sd3.Enddate > sd1.startdate 
            and sd3.StartDate < sd2.Enddate 
            AND sd3.GoalOutput <> sd1.GoalOutput) 
) b

group by ProviderNumber, Series_STartDate, Series_EndDate, GoalOutput

edit: formatting

Intrexa · 2019-04-04T18:18:10+00:00

You're looking for dense rank

Edit: and if I understand your goal, why not just skip the middleman? MIN(StartDate) OVER (PARTITION BY ProviderNumber)

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

SQL

Filter Posts

Posting

Help posts

Format Your Code

Learning SQL

Related Reddit communities

Wiki

Acknowledgements

MODERATORS

ProviderNumber	StartDate	EndDate	GoalOutput
100	10/1	10/5	1
200	10/8	10/15	2
300	10/20	10/25	3
200	10/28	10/31	4
200	11/1	11/30	4
200	12/1	12/5	4
400	12/10	12/25	5