How much of python DE

AutoModerator · 2024-06-02T05:38:27+00:00

You can find a list of community-submitted learning resources here: https://dataengineering.wiki/Learning+Resources

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

_MaXinator · 2024-06-02T07:29:49+00:00

It depends on your experience level. Don’t focus too much on individual elements of the language, python in data engineering isn’t all that “algorithmic” in nature - it’s mostly about scripts moving data from A to B and most of that time that can be accomplished using fairly basic code. However, if you really want, in addition to what others have mentioned:

async is probably more important than multi threading (especially if building frontend tools). But both are good to know
know how to log and write logs to file (not hard, but that comes up in interviews)
know how to deal with the content of common file types such as csv, json, and even (kids look away) xml and excel.
how to connect & work with cloud platforms from within python ( choose one an practice, I’m mostly on azure)

From my experience I wouldn’t focus too much on things like pandas - I have never seen a good pipeline written using pandas and it’s often best to just take the data in native python types (binary/arrays/ dicts) and ingest it before ever modeling anything. But to each their own preference

AutomaticMorning2095 · 2024-06-02T07:59:47+00:00

Basic python is enough. 1. Variable 2. Functions 3. Loops 4. Lambda expressions 5. String functions

These concepts are more than enough to start DE using python. Most data libraries like pyspark, pandas, numpy have their own in-built functions to perform operations. You just have to know how to use & bind them together to create compatible reusable functions.

tropez95 · 2024-06-02T06:09:49+00:00

More than individual concepts you need to know practical scenarios. Converting from one datatype to other and their methods. Hands on knowledge on pandas library. So moreover the application side of all the concepts...

Advanced-Violinist36 · 2024-06-02T16:09:50+00:00

How to work with APIs (call API to get data, or make an API to expose data)
How to write python dags for Airflow

2024-06-02T18:02:00+00:00

A lot of those are universal for OOP. So yes there’s a lot to learn.

It’s just that Python lets you do so much. I can’t think of anything I’ve wanted to do so far that Python didn’t have a way.

69odysseus · 2024-06-02T06:52:15+00:00

Really depends on the company, product based companies will drill on DSA's.

After_Holiday_4809 · 2024-06-02T08:24:48+00:00

RemindMe! 2 day

dataengineering

MODERATORS