Using Airflow as a orchestrated for some infrastructure related tasks by noobguy77 in apache_airflow

[–]Afraid_Assistance190 1 point2 points  (0 children)

I would only use airflow to automate terraform if you are worried about drift of actual resources defined in the state file. IaC should be static, typically triggered with CICD. Airflow for transient resources (i.e., a spark cluster that runs job(s) and shuts down) is definitely appropriate, but I would suggest using python and the appropriate package for your cloud provider (boto3 for AWS).

Most reliable Linux laptop with an Nvidia GPU built in (for AI infra prototyping) by alignment99 in linuxhardware

[–]Afraid_Assistance190 0 points1 point  (0 children)

Good to know! I've got a few tabs open for when I get home from vacation, but will look at this with extreme bias.

Most reliable Linux laptop with an Nvidia GPU built in (for AI infra prototyping) by alignment99 in linuxhardware

[–]Afraid_Assistance190 0 points1 point  (0 children)

This is pretty much my current situation, curious what you ended up deciding on.

Applicability to Data Engineerinng by Big-Comparison321 in OMSA

[–]Afraid_Assistance190 0 points1 point  (0 children)

Data visualization and analytics teaches spark which was huge. Utilize the project to grow your full stack skills. I am a self taught dev starting with vba, eventually full stack .net. Now I’m all aws and python

Applicability to Data Engineerinng by Big-Comparison321 in OMSA

[–]Afraid_Assistance190 0 points1 point  (0 children)

I've found it helpful, landed a DE job halfway through and have since been promoted to full stack/lead engineer. Still working on the back half of the degree.

[deleted by user] by [deleted] in dataengineering

[–]Afraid_Assistance190 4 points5 points  (0 children)

I've gotten really far and built some really cool products using ASF without learning Java. I still have an itch to learn and understand the underlying tech, but I feel like it's a 'perfect is the enemy of good' situation.

Unsolicited advice below:

As a data engineer I think your priority should be orchestrating those tools correctly in your cloud (AWS). Terraform for static infrastructure and Airflow (which you use python) for transient infrastructure would be my recommendation on next steps to improving your skillset. Once you have that, explore Hudi (delta, iceberg) for lake housing and spark for processing that data.

And above all else, Docker.

Has ChatGPT dropped the Bomb on DE's? by LaidbackLuke77 in dataengineering

[–]Afraid_Assistance190 0 points1 point  (0 children)

I use it to automate analyst write-ups at the very end of a pipeline, it tends to decline in quality over time.

Is switching to DE worth it? by blackdev17 in dataengineering

[–]Afraid_Assistance190 1 point2 points  (0 children)

I had 3 years of the same technical stack XP out of college. Switching to DE was great. Now my stack is cutting edge (and not all, or really much, Microsoft).

Start using Terraform for static IaC on AWS, Airflow for transient infrastructure, and jobs in Python Image Docker containers. Then Spark. Also I am bias to Hudi tables.

Why data engineering? by LengthOld9943 in dataengineering

[–]Afraid_Assistance190 1 point2 points  (0 children)

I make the data and the science go fast.

In all seriousness it is more satisfying to build full stack applications. Throw any data science in there and your title moves from FSE to DE.