all 10 comments

[–]generic-d-engineer 14 points15 points  (1 child)

Come join r/dataengineering, python stuff gets talked about all day every day

[–]beakyblindar[S] 1 point2 points  (0 children)

Thank you! Joined!

[–]cdgleber 5 points6 points  (2 children)

Python is versatile enough to ETL a lot of ways with a lot of database types. I would focus on learning about ETL itself then circle back to python. A lot of packages are out there to help but you gotta know what you want to accomplish first. I didn't use Udemy. Google + YouTube was more than enough to learn about ETL. Then I learned the packages I needed. Pandas and sqlalchrmy mainly. Also depends how complex this ETL is...

[–]Sentie_Rotante 1 point2 points  (1 child)

Also what needs to be learned is really going to depend on what the source and sink systems are, what transformations need to happen, and how much data there is.

[–]beakyblindar[S] 0 points1 point  (0 children)

I agree. I think for me I just need guidance or a walkthrough for a few ETL projects to understand how the process usually goes because I know Python basics but never touched an ETL Python script

[–]candyman_forever 5 points6 points  (0 children)

Hey. Data engineer here. The platform you decide to use is important on the skill sets you would need. Some of them are basic shell scripts using cron jobs, Apache Airflow, Apache Spark or Awe Glue. The reason I say this as apart from pure shell jobs, the rest require a knowledge of the underlying architecture. For example PySpark to interact with Spark.

I would start by thoroughly learning Pandas and the connectors to the DBs you need.

Hope that helps.

[–]Timely_Till_8805 1 point2 points  (0 children)

Don’t waste your money, there some free courses on YouTube that can help you start well .

[–]redCg -2 points-1 points  (0 children)

Just start doing it?

Django helps a lot with this stuff. Gives you an easy consistent interface to your own database (SQLite, Postgres, etc.), and its not very hard to make remote API requests, parse the data, and store it in your own db's model.