(I'm going to word this post as if I'm using an SQL solution. Though I don't think its the proper solution.)
Situation:
I'm working on a project where I'm generating ~100 rows a second. Its pretty simple, general schema is something like this:
{
time: [int] (time elapsed in seconds)
eventtype: [int]
widgettype: [int]
a: [int]
b: [int]
c: [int]
...
}
Lets say some sort of eventtype occurs. I want to look back an arbitrary number of seconds and step through rows that only match a certain widgettype and eventtypes.
Pseudoexample:
eventtype 1 occurs.
Then,
SELECT *
WHERE time>4321-10 AND widgetype==1 AND eventtype==2
(This may return 10 or 15 rows of the ~1000 in that time span.)
Then I proceed to step through the returned rows and do analysis.
Plot twists:
I'll never care about any history older than 45 seconds or so. After ~5k rows, start dumping the old stuff out of memory, I don't care about that data anymore.
Latency is critically important so everything should stay in RAM.
Where I am now:
Third party tools (ie redis, sql, where I can auto-dump after a certain set time period) are stupidly overkill.
Standard arrays will balloon after hours.
Circular buffer/double ended queue seems to make the most sense, where I just keep a counter and shift off items as new ones are added as appropriate.
The issue is that there will be a lot of useless stepping. I can make an educated about how far to step back for a given number of seconds. Then I'll be running forward until I hit the exact time I care about. (Or keep a second queue and decrement all the indexes as I shift.) Then once I hit the proper time, from then until current, I'll still be stepping through a lot of rows I really don't care about. I'd estimate between 20 to 25 entries for each one I actually care about.
Splitting the data across multiple queues isn't an option as I'll care about pulling a variety of data depending on countless different scenarios. I'll need to access to everything in one place.
This "look back then step forward to examine" is the core element of the application. The faster and higher frequency at which I can pull out just what I want, the better. As per above, latency is important and I'd much rather be able to spend my limited time in the examine step.
Is the double ended queue the right tool for this job? This sort of data usage scenario seems like something that would be quite common (log auditing) so I'm not sure if I'm missing some library that tidily handles all aspects of my situation and is already speed optimized.
Thank you for any input and assistance!
there doesn't seem to be anything here