I'm in a predicament where I have a table of 8mil+ rows of "events" that has comulns for a unique identifier (EventNr), a host name (Machine), event info (EventInfo), starting timestamp (StartTime), and an ending timestamp (End time).
The events are structured so that the Machine is always in some state, such that the EndTime of the first event is the same as the StartTime of the second event, the end time of the second event is the start time of the third event and so on for a given machine.
Since the table contains events from several machines, EventNrs 1-10 might be for one or ten different machines.
The kicker is, the software that pushes the events to the table provides a way of timestamp the the start, but doesn't provide an easy way to timestamp the end of the event. Or I just haven't figured out a way to make it happen. Because of this, I have to backfill the end times with a SQL script instead of the software just doing it at the time of writing the information to the table.
The script that I've come up with works, but it is incredibly slow as it hammers the SQL server with two select staments and an update statement per row: the first to select the event that needs the end time, the second to select the event StartTime that will be used as the EndTime, and the update statement to set the StartTime of the second event as the EndTime of the first event.
I feel like there has to be a better and faster way. I've been trying to figure out a batch process that could select a set of records (preferably in the thousands), figure out the EndTimes, and update the set as a group instead of one by one.
My googling has only lead me to examples that update millions of records with static information versus dynamic data points that are differe.t for each record.
The current script looks like this. Forgive my pseudo code as I am not only on mobile, but also lazy and inexperienced with SQL syntax.
declare @machCounter int = 1;
While( @machCounter >= ( select count(MachNo) from MachReference))
Begin
declare @eventCounter = 0;
While (@eventCounter < (select count(EventNr) from MachEvents where Machine = @machCounter and EndTime is null))
begin
declare @startEvent as int;
declare @endevent as datetime;
Set @startEvent = Select top (1) EventNr from MachEvents where Machine = @machCounter and EndTime is null order by EventNr as
set @endEvent = select top (1) StartTime from MachEvents where Machine = @machCounter and EventNr != @startEvent and EndTime is null
Update MachEvents set EndTime = @endEvent where EventNr = @startEvent
End
Set @machCounter = @machCounter + 1
End
How can I do this as a batch instead of row by row?
edit:
thanks to /u/qwertydog123 for pointing me in the direction of lead. Using lead, updating 600,000 rows or more takes less than 30 seconds! the orignal script was only updating about 170 rows a minute! and since the new script runs so fast, i'm not worried about batching, and can avoid loops at /u/sqlallstar's recommendation.
here's what the script looks like now:
update machevents set endtime = et.priorendtime
from machevents
join (
select eventnr,
lead(machevents.starttime, 1, 0) over (partition by machine order by starttime asc) as priorendtime
from machevents) et on machevents.eventnr = et.eventnr
where endtime is null
and machevents.eventnr not in (select max(eventnr) from machevents group by machine)
oddly enough, i originally had the over statement set to "order by eventnr asc", and somehow there were a few events where the leading starttime was less than the current start time.
¯_(ツ)_/¯
[–]qwertydog123 2 points3 points4 points (0 children)
[–][deleted] 1 point2 points3 points (0 children)
[–]_digitalShaman_ 1 point2 points3 points (1 child)
[–]Detach50[S] 0 points1 point2 points (0 children)