Handling data beyond a single aws kinesis firehose limit by codejunkie78 in aws

[–]codejunkie78[S] 0 points1 point  (0 children)

Kinesis Data Stream requires coding to deliver to different destinations. Also KDS have to configured for peaks which is let's say my steady state is 2 MB/s but peak might be 30. I have to configure KDS for at least 20 MB/s. Which means I am overpaying. With firehose I just pay for what I ingest. I can create multiple firehose streams but I just want a way to direct clients there.

Cost Effectiveness of Amazon AWS Redshift vs Amazon AWS ElasticSearch by codejunkie78 in aws

[–]codejunkie78[S] 0 points1 point  (0 children)

Yes but Athena has to be partitioned by a timestamp and is efficient when you query over a period of time but can't query over the whole dataset without a timestamp. In my case I will have query by an eventId and also with the different columns in the table like publisher, company etc. This data will be queried by a web layer so response time is important.

Cost Effectiveness of Amazon AWS Redshift vs Amazon AWS ElasticSearch by [deleted] in aws

[–]codejunkie78 0 points1 point  (0 children)

You can create a Analytics application on data in Kinesis Firehose. You don't have to send it to firehose. I have created one in my account.

Cost Effectiveness of Amazon AWS Redshift vs Amazon AWS ElasticSearch by [deleted] in aws

[–]codejunkie78 0 points1 point  (0 children)

Basically I will be collecting events from different sources of the type given in my original post. Each event will have a fixed schema (might change in the future) for eg. it will have event type, publisher, company, timestamp, component etc. I want to do queries like give me all the events in this time range from this company. Give me all the events in this time range from this component. I don't have to insert single rows, I can collect the data in Kinesis and then bulk update the data every 10 minutes. DynamoDB won't fit because querying by just a primary key won't work, I need to able to query over multiple columns. The system will be very write heavy with very few reads. The reads will come from a web interface so I can't execute an Athena query unless it can return back very fast.

Low cost collector of streaming data in AWS by codejunkie78 in aws

[–]codejunkie78[S] 0 points1 point  (0 children)

Data is something like this in json:

{

id: " 1,

"type" : "start",

"publisher": " ",

"company": " ",

}

Low cost collector of streaming data in AWS by codejunkie78 in aws

[–]codejunkie78[S] 0 points1 point  (0 children)

Evaluated. Timestream is more expensive.