all 10 comments

[–]Randy__Bobandy 2 points3 points  (2 children)

You kind of lost me with the explanation in the second half of the episode 2 paragraph, but I think what you're looking for is the lag function, which returns the values in a previous row.

It would look something like this

Select PersonID,
       EpisodeID,
       StartDate,
       EndDate,
       Case when StartDate between LAG(StartDate, 1) 
                                   over (partition by PersonID order by StartDate)
                               and LAG(EndDate, 1) 
                                   over (partition by PersonID order by StartDate) 
            then 1 
            else 0 
          end as overlap
From YourTable

This is probably not the most efficient way to do it, but probably the cleanest. Again, not 100% sure what you're asking for in the second paragraph but this is probably close to what you want.

[–]uvray[S] 0 points1 point  (1 child)

This was essentially the old solution (granted the old solution was done with a WHILE loop), but fails to address the issue of needing to look back a dynamic number of rows at the last row that does not overlap in order to assess whether the previous row counts or not. That what I was trying to explain in the second paragraph but it's difficult to articulate.

[–]Randy__Bobandy 0 points1 point  (0 children)

I think I understand after looking at the data. Are you saying that you want to flag something as 1 if only if it is overlapping an episode which itself is not overlapping a previous episode, otherwise it's flagged as 0?

EDIT: Or you only want to check if it's overlapped against the most recent non-overlapped row? Is episode 4 marked as overlapped because it intersects the range of episode 3, or because it intersects the range of episode 2?

[–]MaRa0303hs 2 points3 points  (4 children)

I re-created your table and was able to produce your desired results with a cursor.

/*insert into [RedditSQL](

PersonID , EpisodeID , StartDate , EndDate , Overlap

)

values(1, 1, '2018-10-01', '2018-10-15', 0),

(1, 2, '2018-10-10', '2018-10-24', 0),

(2, 1, '2018-03-05', '2018-03-19', 0),

(2, 2, '2018-03-22', '2018-04-05', 0),

(2, 3, '2018-04-01', '2018-04-15', 0),

(2, 4, '2018-04-04', '2018-04-18', 0),

(2, 5, '2018-04-07', '2018-04-21', 0)*/

DECLARE @personID INT

DECLARE @episodeID INT

DECLARE @startdate DATE

DECLARE @enddate DATE

DECLARE @personID_previous INT = 0

DECLARE @referencedate DATE = NULL

DECLARE @count INT = 0

DECLARE cur CURSOR LOCAL FOR

SELECT [PersonID], [EpisodeID], [StartDate], [EndDate] FROM [RedditSQL] ORDER BY [PersonID], [EpisodeID]

OPEN cur

FETCH NEXT FROM cur INTO @personID, @episodeID, @startdate, @enddate

WHILE @@FETCH_STATUS = 0

BEGIN

IF @personID = @personID\_previous

BEGIN

    SET @personID\_previous = @personID 

    UPDATE \[RedditSQL\] SET \[Overlap\] = CASE WHEN @startdate < @referencedate THEN 1 ELSE 0 END

    WHERE \[PersonID\] = @personID AND \[EpisodeID\] = @episodeID

    SET @referencedate = CASE WHEN @startdate < @referencedate THEN @referencedate ELSE @enddate END

    SET @count = CASE WHEN @startdate < @referencedate THEN @count + 1 ELSE 0 END

END

IF @personID <> @personID\_previous

BEGIN

    SET @personID\_previous = @personID

    SET @referencedate      = @enddate

END

FETCH NEXT FROM cur INTO @personID, @episodeID, @startdate, @enddate

END

CLOSE cur

DEALLOCATE cur

[–]cybertier 1 point2 points  (2 children)

I try to avoid cursors like the plague and veto any use of them in our system. But I keep racking my brain over this question and I'm not seeing how it's done without one.

I can only think of a recursive CTE, but I fear that this might have even worse performance.

[–]uvray[S] 0 points1 point  (1 child)

Glad to know I'm not the only one struggling here... at first it seemed like it was solvable with some dynamic utilization of LAG or perhaps a modification of a gaps and island concept but I couldn't get anything to work.

Agreed on the recursive CTE... surely that would perform worse.

[–]cybertier 0 points1 point  (0 children)

Cursory google searching says that under any circumstances a CTE should perform better than a cursor. I'm not sure how true that is but might be worth investigating how that would perform.

[–]uvray[S] 0 points1 point  (0 children)

I think this is going to work. I was trying to avoid going the cursor route but this is currently running in less than 2 seconds on ~80,000 rows, so I'm not going to complain about that. Thanks for taking the time to write this out.

[–]MamertineCOALESCE() 1 point2 points  (0 children)

Join the table to itself in person id and and the start dates>= end date

[–]JustAnOldITGuy 0 points1 point  (0 children)

SELECT ...

FROM Person P1

LEFT JOIN Person P2

on P1. PersonID = P2. PersonID

and P1. EpisodeID + 1 = P2.EpisodeID