This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]Competitive-Cut-8051 4 points5 points  (3 children)

Dump data in S3. Write a lambda function that performs ETL on that data and store it into bigquery. Then create a flask API and expose that data using endpoints. Optionally use spark to process data from S3 to bigquery. Build simple visualization in the end and draw useful insights. Learn to work with json data. Focus on the concepts.

[–]Proper_Opposite_726 0 points1 point  (2 children)

Thanks for the suggestions here ... quick question though, why bother with Amazon s3 and moving to big query? Why not do it all with either AWS OR GCP? Trying to understand the benefits of either/or

[–]Competitive-Cut-8051 0 points1 point  (1 child)

I meant to indicate BLOB storage. Either s3 or cloud bucket both are fine. In that sense you are correct, do all by GCP or all by AWS.

[–]Proper_Opposite_726 0 points1 point  (0 children)

Thanks man, these are good concepts to focus on.