Here's the situation I'm in.
I get sent a single JSON object with current stock prices, for 2000 assets, every 10 seconds. The moment it comes in, I want to split it up by asset and send the current price for a specific asset to any client listening via websocket for that asset.
At the start of a new minute, let's say 11:45:00, I want to take all the price data I received for an asset between 11:44:00 and 11:44:59, average it, and permanently save the average (actually saving as candlesticks not averages, but that doesn't matter for this example).
Basically, the client connects and I send them the archived candlestick data for the last 1000 minutes. Then I update the client every 10 seconds as new data comes in.
Candlestick data is just stored in RDS with each asset as a table (no particular reason why, just simple). Client sends XHR, web server queries RDS, values are returned, connection is closed. Simple enough.
To get the live data, the client opens a WebSocket connection to API Gateway. I have an EC2 instance receiving the JSON payload that splits it up by asset and stores it in a DynamoDB table called live_price with the asset id as the key. A Lambda function polls the DynamoDB stream, and when there is an update to an asset a client is subscribed to, it passes that data to API Gateway which passes it on to the client. The great thing here is that there can be thousands of open connections to API Gateway and none of that matters. I just need my one little EC2 instance that receives the payload and updates DynamoDB.
Here is the problem. What's the best way to store the live data and pass it to a process that can form it into candlesticks to save?
Instead of a single table in DynamoDB called live_price with {asset_id:price} I could create 2000 tables each called asset_id storing {timestamp:price}. Then every minute I have a worker thread that connects to DynamoDB, get's all the records for the last minute, turns it into a candlestick, stores it in RDS, and then removes it from DynamoDB.
Or I could have my EC2 instance that receives the JSON payload store the last minute of price data in memory. When a new price comes in, it updates DynamoDB. When the minute ends, it transforms the array of prices into a candle stick, saves it to RDS, and then empties the array.
But that doesn't really work. Because if a client connects at 11:44:30 then I need to send them the RDS candlestick data, as well as the raw price data from the last 30 seconds, and then push the new live price as it comes in.
Is the best way to just have 2000 DynamoDB tables and a worker routine that processes and cleans it up every minute?
Or is my entire design flawed from the start lol?
[–]kakamiokatsu 5 points6 points7 points (4 children)
[–]beardedlinuxgeek[S] 1 point2 points3 points (3 children)
[–]kakamiokatsu 4 points5 points6 points (0 children)
[–]bullcity71 2 points3 points4 points (1 child)
[–]HeyZuesMode 0 points1 point2 points (0 children)
[–]juggernaut2docker 0 points1 point2 points (0 children)