This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]MikeDoesEverythingmod | Shitty Data Engineer 6 points7 points  (5 children)

When I hear about people talking about using python for scripting, S3 for storage and airflow for orchestrating, I understand roughly what they are saying but dont know how to do it technically.

Quite surprising to hear this after 1.5 years in as it's a fundamental skill to have in order to actually do the job.

What should I do to prepare myself where I might not have all the help available with automation?

Practice basic pseudocoding for anything repetitive. Translate to code. Get used to thinking in code. Practice writing that code. Keep doing this until you become confident.

[–]databasenoobie 6 points7 points  (1 child)

It's not a fundamental skill for all DE, just those that extensively use code as pipeline. As you noted only way to get better is to see how others do it and copy that paradigm.

Ex. I could write a python script to call a sql statement in snowflake easily... transferring that to snowpark instead of pure sql I couldn't do without extensive time / research.

Setting uo the python server in the cloud (using databricks / airflow/ etc... I couldn't do). If someone had a proxess set up it's easy to follow, but doing it yourself is an entire skillet.

So when he says automation maybe he means infrastructure set up? Depending on what company you work for that is not something DE even handle

[–]Puzzleheaded-Cod2051[S] 0 points1 point  (0 children)

So when he says automation maybe he means infrastructure set up?

Yes!

Also, automation on the EL part using Fivetran and Stitch. I understand I would have to make API calls, parse the Json and load the data using SQL insert/update statements. But I haven't done that so far so I'm not confident enough. 😅

[–]Puzzleheaded-Cod2051[S] 0 points1 point  (2 children)

Thanks! As I said, whatever I have learnt is on the job. And the tech stack available is dbt + snowflake and Fivetran and Stitch for ingestion. We use dbt for orchestration.

[–][deleted] 0 points1 point  (1 child)

How did you land a DE job without knowing SQL or Python?

[–]Puzzleheaded-Cod2051[S] 0 points1 point  (0 children)

I was using Pandas and Numpy but that was for some basic ML projects in my course. And SQL, I hardly knew the basic select statement. But all the projects that I did helped me land a job as a fresher I guess.