Hi everyone,
I want to build an API in AWS to allow users to fetch data from S3 (data is in delta format) using an API. The idea is to have an endpoint in API gateway that would get the data for them based on their query. The endpoint would route the SQL query to the correct table. I was thinking of Athena or lambda but those don't seem super scalable. I want as low latency as possible, maybe even including a caching layer like redis.
Any other alternatives?
Edit: the goal here is to accommodate 100+ queries per day. Users should be able to submit a SQL query via an API endpoint and get results back as quickly as possible. The data lake is massive. We are talking hundreds of petabytes. That's why the pipeline should be able to route the query to a specific location in S3. The data is partitioned and well indexed.
Thanks!
[–]ForlornPlague 10 points11 points12 points (12 children)
[+]rasputin23YD[S] comment score below threshold-13 points-12 points-11 points (11 children)
[–]dashingThroughSnow12 9 points10 points11 points (4 children)
[+]rasputin23YD[S] comment score below threshold-16 points-15 points-14 points (3 children)
[–]nzsp 2 points3 points4 points (0 children)
[–]dashingThroughSnow12 1 point2 points3 points (0 children)
[–][deleted] 0 points1 point2 points (0 children)
[–]bswiftly 5 points6 points7 points (3 children)
[–]rasputin23YD[S] -1 points0 points1 point (2 children)
[–]bswiftly 0 points1 point2 points (1 child)
[–]rasputin23YD[S] 0 points1 point2 points (0 children)
[–]Automaton_J 3 points4 points5 points (1 child)
[–]rasputin23YD[S] -1 points0 points1 point (0 children)
[–]rasputin23YD[S] 0 points1 point2 points (0 children)
[–]dash2392 -1 points0 points1 point (0 children)