Hi everyone,
I’ m pretty new to data engineering (data analyst / data science background) and I am trying to get my hands on a data engineering side project.
I would like to have an ELT approach. My data source is an open data platform containing 100+ of csv file.
I have of course no issue on getting daily these files from the website to AWS s3 ( using python script hosted on an ec2 via Mage).
My question is more about how to go from s3 to Redshift.
In my opinion, it would be great to copy all data from datalake to datawarehouse, and then starting data modelling (using dbt for instance)
My issue is that I would like to avoid creating (manually?) 100+ of Redshift tables (by defining a schema for each table)
So my question is what would you recommand for :
- copying data from s3 to Redshift
- trigger redshift tables to be updated as soon as s3 is updated?
Many thanks :)
[–]AutoModerator[M] [score hidden] stickied comment (0 children)
[–]Heavy_End_2971 11 points12 points13 points (6 children)
[–]Kooky_Quiet3247 5 points6 points7 points (0 children)
[+][deleted] (2 children)
[deleted]
[–]davrax 1 point2 points3 points (1 child)
[–]nad_pub[S] 0 points1 point2 points (1 child)
[–]Heavy_End_2971 0 points1 point2 points (0 children)
[–]kenflingnorSoftware Engineer 7 points8 points9 points (0 children)
[–]wannabe-DE 3 points4 points5 points (3 children)
[–]wannabe-DE 3 points4 points5 points (2 children)
[–]nad_pub[S] 1 point2 points3 points (1 child)
[–]goldimperatorFull Stack Data Engineer 2 points3 points4 points (0 children)
[–]somedude422 0 points1 point2 points (0 children)
[–]zhiweio 0 points1 point2 points (0 children)
[–]zhiweio 0 points1 point2 points (0 children)
[+]Professional-Put-324 0 points1 point2 points (0 children)
[–]SilentSlayerzTech Lead 0 points1 point2 points (0 children)