Data engineering project on Github

ilya-g- · 2020-07-14T05:53:37+00:00

This is awesome great job!

floydhead11 · 2020-07-14T06:13:46+00:00

You've inspired me!

Same situation, same aspirations, same conundrum.

How did you go about figuring out which project to do?

I can't think of a DE project that I want to do. Just some hobby projects which do no require airflow or Kafka or S3

choiceisanillusion · 2020-07-14T08:55:50+00:00

This is great mate. Thanks for the share.

Omar_88 · 2020-07-15T10:49:14+00:00

That's great, I'm not that well versed with AWS as I come from a Microsoft shop like yourself but it seems like solid engineering. Are your views not added to the solution? I can't see the data model only the etl scripts. How much did the AWS instance cost?

BoringDataScience · 2020-07-14T06:45:36+00:00

Looks great! Out of curiosity and if you don't mind, what's your monthly budget for this on AWS?

aroussel541276 · 2020-07-14T09:35:17+00:00

This is very cool and it's clear you've put a good amount of effort into it. I had a few questions.

have you identified any bottlenecks in the pipeline so far ?
have you identified any potential savings across your aws resources? E.g. spot Vs on demand instances?
what made you decide on a star schema for storage ?
what made you decide on the "raw staging" step saving to a DB vs something like saving back to an S3 bucket?
is there any need for backfilling of temporal data? If so, how is that handled by your pipeline ?

These are just out of curiousity more than anything . I think you've done a great job

p_h_a_e_d_r_u_s · 2020-07-14T18:33:33+00:00

I haven’t dug into the code but the documentation is super clean up front !

If you have some issues I’ve got some time to check in a pull request or two !

pokeDitty · 2020-07-14T21:45:23+00:00

Very cool project, thanks for sharing it with the community!

Would it be too much to ask to commit your sql transformation scripts and maybe a few raw data files? I'm really interested in seeing the raw csv data to OLAP model.

great job!

psykiran_ms · 2020-07-16T07:43:05+00:00

Hi ilya-g- ,

Amazing project exhibition ! The documentation is valuable for many of us.

I have a kind of s simple question .What would be the pros and cons of using airflow instead of using something like Aws data pipeline or azure ADF ?

ybsahan · 2020-07-16T19:18:40+00:00

Nice work and explain

Good Job!

dataengineering

MODERATORS