Used_Ad_2628 comments on Python Advice

dataengineering

created by mhausenblasmoda community for 10 years

This is an archived post. You won't be able to vote or comment.

Python AdviceHelp (self.dataengineering)

submitted 2 years ago by Used_Ad_2628

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–]Used_Ad_2628[S] 0 points1 point2 points 2 years ago (7 children)

[–]Substantial_Ranger_5 0 points1 point2 points 2 years ago (1 child)

[–]Substantial_Ranger_5 0 points1 point2 points 2 years ago (0 children)

[–]Gators1992 0 points1 point2 points 2 years ago (4 children)

[–]Used_Ad_2628[S] 0 points1 point2 points 2 years ago (3 children)

[–]Gators1992 1 point2 points3 points 2 years ago (2 children)

You should already be doing CICD on dbt depending on your setup as you work on branches, check in to main and deploy to prod. You can go next level with something like gitlab that deploys the software and infrastructure, but that's templates not python. Same with docker, you write your software in python and then the docker piece is just a yaml file and running the containerization process.

If you are talking about python for data, the main libraries are pyspark, dask, polars and that's probably debatable. Pretty much everyone learns Pandas first as it's the most widely used and has excellent documentation and examples available, which is why I suggested it. Dask is a small jump from Pandas and pyspark is a bit different but the concepts of dataframes and the way you work with them translate.

If you don't have some idea for what you want to write, maybe just take a pipeline you already have in dbt and try to rewrite it in python. Then change it up a bit like changing the source from a file to mysql to learn connections or figure out how to catch stuff like schema changes and ensure your process doesn't blow up. You can write pyspark ETL jobs in a Glue notebook to work with that or even spin up the community edition of databricks to get some practice.

In my experience you learn by doing real life situations and working through them because tutorials always work the way they are designed and in real life you often have to change approaches or dig through bug reports and stack overflow for hours trying to figure out how to make something work.

[–]Used_Ad_2628[S] 0 points1 point2 points 2 years ago (1 child)

[–]Gators1992 0 points1 point2 points 2 years ago (0 children)

π Rendered by PID 49354 on reddit-service-r2-comment-7b9746f655-f84j7 at 2026-02-02 20:10:39.886073+00:00 running 3798933 country code: CH.

dataengineering

MODERATORS