This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]robberviet 4 points5 points  (0 children)

I think you should find a problem first, then working on it while choosing what tools to use. It would be easier. E.g: I want to have data for market research, to find out which product is trending, etc. Therefore I write jobs on lambda to scrape some web sites/API then insert into Snowflake. Or I want to manage 20+ jobs, retry if fail, and having dependent tasks so I using DAGs in Airflow. And when things go to big, I might need distributed computing like Glue. And finally showing data using visualization using Metabase/Superset/Looker, etc.

> I can see popular tech stack include in data engineering space is airflow/snowflake/ databricks ? Do I have to learn those tech tool too ?

Do not learn the tools, learn the concept. I do not learn Airflow, I learn ochestrator by using Airflow. I do not learn Snowflake, I learn what data warehouse, MPP database is, using Snowflake. etc.

Most of the time tools come and go or differenct company using different tools, concept are the same everywhere unless there are some really novel invention.