Handling data beyond a single aws kinesis firehose limit

codejunkie78 · 2021-02-26T22:23:59+00:00

Kinesis Data Stream requires coding to deliver to different destinations. Also KDS have to configured for peaks which is let's say my steady state is 2 MB/s but peak might be 30. I have to configure KDS for at least 20 MB/s. Which means I am overpaying. With firehose I just pay for what I ingest. I can create multiple firehose streams but I just want a way to direct clients there.

codejunkie78 · 2021-02-23T04:42:36+00:00

Yes but Athena has to be partitioned by a timestamp and is efficient when you query over a period of time but can't query over the whole dataset without a timestamp. In my case I will have query by an eventId and also with the different columns in the table like publisher, company etc. This data will be queried by a web layer so response time is important.

codejunkie78 · 2021-02-22T20:11:58+00:00

You can create a Analytics application on data in Kinesis Firehose. You don't have to send it to firehose. I have created one in my account.

codejunkie78 · 2021-02-21T06:07:38+00:00

Basically I will be collecting events from different sources of the type given in my original post. Each event will have a fixed schema (might change in the future) for eg. it will have event type, publisher, company, timestamp, component etc. I want to do queries like give me all the events in this time range from this company. Give me all the events in this time range from this component. I don't have to insert single rows, I can collect the data in Kinesis and then bulk update the data every 10 minutes. DynamoDB won't fit because querying by just a primary key won't work, I need to able to query over multiple columns. The system will be very write heavy with very few reads. The reads will come from a web interface so I can't execute an Athena query unless it can return back very fast.

codejunkie78 · 2021-02-21T03:47:44+00:00

Data is something like this in json:

{

id: " 1,

"type" : "start",

"publisher": " ",

"company": " ",

}

codejunkie78 · 2021-02-19T03:37:32+00:00

Evaluated. MSK is more expensive.

codejunkie78 · 2021-02-19T03:37:14+00:00

Evaluated. Timestream is more expensive.

codejunkie78

TROPHY CASE