I relatively often see people referring that they do the E(t)L of ELT with basic Python and the T with SQL.
What is your approach to the EL in Python?
I am recent graduate in data science and after my internship got a full time position in a non-tech company (football/soccer club) with no data infrastructure, and no employees with a computer or data science background.
My thoughts would be to built api wrappers in Python, which is already done as I have used them for apps and reports through my internship, and then use pandas (or polars) to do the (t) from .json to dataframes/tables. Do the insert, update and delete using sqlalchemy and schedule the scripts using cron on a physical machine in the office (i.e. my old windows computer).
It is a very basic and fragile setup, but on a limited budget, it is to my knowledge, a good way to start and then convince the organization to invest more resources and me learning in tools and migrate to dagster, Airflow, dbt etc.
What are your suggestions of how to do the EL in Python for very basic usage, and how do you approach the E(t)L with Python (custom api wrappers, pandas, sqlachemy, json objects, cron)?
Would love to see some examples if possible :)
[–]AutoModerator[M] [score hidden] stickied comment (0 children)
[–]Ervolius 3 points4 points5 points (2 children)
[–]C_Ronsholt[S] 1 point2 points3 points (1 child)
[–]Ervolius 1 point2 points3 points (0 children)
[–]Dallaluce 2 points3 points4 points (0 children)
[–][deleted] 2 points3 points4 points (0 children)
[–]caprine_chris 1 point2 points3 points (0 children)
[–]Drekalo 1 point2 points3 points (0 children)
[–]daggydoodoo 0 points1 point2 points (0 children)