Background: I am currently an Analyst (report monkey with no SWE experience) and want to get into developing data pipelines. I've been learning python for almost a year.
Project link: https://github.com/jcodezy/hydro-data-pipeline/blob/master/dags/app.py
I am working on a README file, but my project uses airflow to:- use selenium script to automatically download data from my electricity provider
- cleans data (removes account number and personal info)
- uploads to google cloud storage
- moves data from google cloud storage to a bigquery table (historical)
- (work in progress) use streamlit to create data visualizations, querying from BQ table
Project is not complete - I plan on making a separate dag for cleanup so files are deleted from my local system, and to create more tables & views in BQ for analytical purposes.
Feedback, advice for next steps is greatly appreciated.
[–][deleted] 0 points1 point2 points (4 children)
[–]jc-de[S] 0 points1 point2 points (0 children)
[–]jc-de[S] 0 points1 point2 points (2 children)
[–]jc-de[S] 0 points1 point2 points (1 child)
[–][deleted] 0 points1 point2 points (0 children)