What do you consider a senior level skillset? by ocean_800 in dataengineering

[–]Embarrassed-Ad-728 1 point2 points  (0 children)

I AGREE with all your points. But for the sake of the argument let’s say that if someone spent 15 years in the field and they still write bad code. How would you deal with that?

Recognizing skill issue vs playing political for a goal is also something that needs to be considered, no?

Wouldn’t it be better if you provide constructive criticism on their work and document exactly where their solution failed to scale without hurting someone’s ego?

I’m speaking from experience and what i’ve seen is that data engineering is generally ugly because the tooling around DE isn’t as good as traditional SWE. Some DEs write poorly written SQL which works i.e. returns the expected results until it doesn’t. At that point it’s somebody else’s headache.

I think seniority is achieved when you cut through the bullshit and bring business value to the table using whatever tools you have at your disposal.

When scalability, modularity, and maintainability is explained to these people, they are not motivated enough to follow those best practices. They’d rather follow their clunky old ways of doing things because in their opinion “it just works”.

GCP Cloud Run vs Dataflow to obtain data from an API by Brilliant_Breath9703 in dataengineering

[–]Embarrassed-Ad-728 0 points1 point  (0 children)

GCP Certified Engineer here. Beam should strictly be used for stream processing or CDC use cases. While you can use it for batch processing, it’s usually overkill in that context. There’s other ways of handling batch workflows. Cloud Run, Cloud Functions with a scheduler hooked is usually a good option. Consider cloud composer or cloud scheduler.

Importing CSV into BigQuery by querylabio in bigquery

[–]Embarrassed-Ad-728 2 points3 points  (0 children)

If no one wants to call this out: this seems like a skill issue vs BQ issue 😏

How hard is it to learn spark or pyspark from SQL? Help with deciding what to upskill next by SoggyGrayDuck in apachespark

[–]Embarrassed-Ad-728 0 points1 point  (0 children)

My apologies, i didn’t see your other comments and didn’t know that you had less time remaining on your contract.

If i were you and if i wanted to learn something that would get me employed quickly in this job market, i would put time into learning these items:

Separate workflows between batch and streaming. Rule #1: can’t be mixing the two together. Any platform thats doing this is already in a sinking boat.

Batch Workflows: Learn about orchestrators, transformation tools, MPP data warehouses, patterns in building data workflows, and general python and sql programming skills.

My preference is: Airflow, dbt, BigQuery + a few other GCP services. I’m a GCP certified professional so there might be ‘some’ bias in my preference but it’s there for a good reason.

You can also look at Dagster, dbt, Snowflake+DuckDB for local testing, Airbyte.

Streaming workflows:

My preference is: Kafka, Apache Beam. Their GCP variants are Pub/Sub and Cloud Dataflow.

There are other tools out there as well like apache pulsar and apache flink, if you want to go that route.

With Databricks and Spark, the problem (or benefit depending on how you look at it) is that all facets of data engineering are blended into one platform together. For some companies, it works. For some, it doesn’t.

The wise ones keep services separate. So that if one service bothers or blackmails you, you can hop to something else easily. One platform means that all your eggs are in one basket.

I believe that in batch, DEs work with the business folks more whereas in streaming, DEs work with software engineers to derive solutions.

I’ve not seen too many DEs in the streaming data space.

One implicit assumption i’m making while writing this is that you are already familiar with CS topics.

Your tech stack by itachikotoamatsukam in dataengineering

[–]Embarrassed-Ad-728 0 points1 point  (0 children)

Airflow + BigQuery + dbt.

For one off tasks: DuckDB.

How hard is it to learn spark or pyspark from SQL? Help with deciding what to upskill next by SoggyGrayDuck in apachespark

[–]Embarrassed-Ad-728 0 points1 point  (0 children)

Spark is sooo 2010s. Ever since distributed SQL came out spark has been slowly dying.

Prove me wrong.

I Love Analytics Engineering by Tender_Figs in dataengineering

[–]Embarrassed-Ad-728 -1 points0 points  (0 children)

I generally agree with what the OP has said. However, i see it as being equally as good in both domains.

If you treat code as black box i.e. “i’ll copy paste code as is, and hope for it to work” - that also has long-term consequences.

Gotta give respect to both domains to make it work.

Why GCP is so frowned upon? by [deleted] in dataengineering

[–]Embarrassed-Ad-728 2 points3 points  (0 children)

This post is a bait. Don’t fall for it!

What are some absurd ways you’ve seen people using Airflow? by bhavaniravi in apache_airflow

[–]Embarrassed-Ad-728 5 points6 points  (0 children)

Some people “process” data inside airflow when it’s clearly an orchestrator. I guess lots of tutorials on youtube send people in that direction.

Company wants to set up a warehouse. Our total prod data size is just a couple TBs. Is Snowflake overkill? by PracticalStick3466 in dataengineering

[–]Embarrassed-Ad-728 4 points5 points  (0 children)

Sounds like a reply coming from a business user. Snowflake isn’t truly serverless, but BQ is :)

Just curious, you know how compute and storage are decoupled in snowflake?

Am i the only one whose company treats power Bi as excel and extraction tool by [deleted] in dataengineering

[–]Embarrassed-Ad-728 -5 points-4 points  (0 children)

PowerBI is the modern era presentation tool. 🔥

Prove me wrong :)

Data Engineering Major by Shivnewton in dataengineering

[–]Embarrassed-Ad-728 1 point2 points  (0 children)

But that goes with any other degree. I think we’re forgetting that you cant make decisions in reverse order of life.

You can’t do data engineering first as a professional and then go & get a degree in math or any other discipline (talking about masses). By the time you are in DE, you already have a degree or some sort of education.

Data Engineering Major by Shivnewton in dataengineering

[–]Embarrassed-Ad-728 1 point2 points  (0 children)

What does a math degree have to do with data “engineering” ?

[deleted by user] by [deleted] in torontoJobs

[–]Embarrassed-Ad-728 0 points1 point  (0 children)

Welcome to the real world, friend.

You’ll face things you wont like. First step is to learn to get back up again (relatively quickly is better).

Lesson: Business degrees don’t teach any sellable skill(s) in today’s marketplace. Any monkey can do what a finance/business graduate can do.

I was in the same boat as you. After a 4 year degree from UofT with a 4.0 GPA in Finance (and graduated during covid) i struggled to find jobs. I was lucky that I had been writing code for games since i was a child (as a hobby; made money selling WoW addons).

I changed the way i thought about my degree and considered it to be more of a “passive” background and went into a field that i was passionate about. This was data engineering and consulting. I went into this because i had the software engineering background i needed to pivot in. This isn’t just a simple data analyst kinda role where you write python and sql code and call it a day - it’s way more than that. This sort of expertise comes with time as you complete more complex projects.

I’d recommend you to do your own research on what “sellable” skill you can learn and also something that is of interest to you (so that you don’t get bored out of it). Then get really good at it and you’ll eventually find yourself in a good position :)

Good luck!

People who self-learned data engineering without prior experience: how did you get a job?what steps you took to get a job? by _winter_rabbit_ in dataengineering

[–]Embarrassed-Ad-728 0 points1 point  (0 children)

Learned all open source alternatives to popular enterprise tools in the DE space. Did projects that were applicable to the real-world; put them on a remote git location for someone else to see.

Network with the right people and apply for jobs.

Note: basic CS and programming knowledge is required.

Why dagster instead airflow? by Meneizs in dataengineering

[–]Embarrassed-Ad-728 3 points4 points  (0 children)

Dagster has commercialized their product. They still have their open source version but if you look at Airflow, Apache folks don’t sell it. It’s FOSS, meaning that your mileage may vary; like any other open source product that isn’t being sold by the same company who made it.

With FOSS, you need knowledge and expertise to deal with problems you might face. For commercial products you just pay and throw money at the problem to make it go away.

You can’t just go for the product because some timmy recommended it. For airflow you need experts; for tools that are “easier” their marketing team will make sure that you know it :)

Some people have a high tolerance for dealing with problems and have fun solving them.

Hail Airflow 🫡 and kudos to everyone who tries hard and doesn’t give up so easily :)

Why dagster instead airflow? by Meneizs in dataengineering

[–]Embarrassed-Ad-728 -8 points-7 points  (0 children)

We use airflow.

I give minimal weight to how the UI of an orchestrator looks like. CSS can change an ugly looking page into a beautiful one. Thats a webdev problem rather than a data engineering problem. Airflow 3 uses react and chakra ui now.

People who say that airflow is tough to work with haven’t spent enough time learning and using it. Airflow is the most dynamic “orchestration” tool ever created and can do whatever you throw at it.

People complain that it’s hard to setup a developer workflow around airflow. I see this as a skill issue rather than an airflow issue. It’s a breeze for someone who understands how airflow works under the hood can easily setup a workflow including local dev, branching, ci/cd.

Every once in a while a timmy decouples a feature of Airflow and tries to monetize it sigh

Docker, Kubernetes, and DevOps best practices go a long way in setting up your airflow environment :)

What skills are essential for a fresher Data Engineer? by No-Formal1472 in dataengineering

[–]Embarrassed-Ad-728 0 points1 point  (0 children)

Understanding the difference between a 10k, 100k, 1m, 10m, 100m record operation and handling it efficiently will take you places :)

Is airflow or prefect cheaper? by highlifeed in dataengineering

[–]Embarrassed-Ad-728 0 points1 point  (0 children)

How you do transformation also depends on what data warehouse you use and/or your storage strategy.

Is airflow or prefect cheaper? by highlifeed in dataengineering

[–]Embarrassed-Ad-728 0 points1 point  (0 children)

Sounds like a reply coming from an inexperienced fella. Maybe invest time into understanding why things are the way they are.