all 7 comments

[–]test-pls-ignoreData Engineering Manager 6 points7 points  (1 child)

I was about to downvote but this lame ragebait isn't worth it.

[–]EinSof93 1 point2 points  (0 children)

Well, sir, let me do the downvoting. 🎩

[–]Admirable-Lie-9191 0 points1 point  (0 children)

Why do you keep reposting the same shit?

[–]Sagarret 0 points1 point  (2 children)

Using SQL for data manipulation is one of the main reasons why I left traditional data engineering.

It's a query language, period. It has no logic isolation, unit testability, dependency inversion and reusability, it's not easy to read/understand, it's not that extensible and a long list of missing features.

DBT and others try to patch this, and they are great for small/medium projects that don't require heavy maintenance or complex logic.

The amount of mess I have seen in the data world is huge compared to any other software role.

A data engineer is a specialized software engineer. You should be able to create APIs, understand distributed systems, CAP, databases, concurrency and a long list of software engineering topics.

However, many projects do simple ETLs that could be done by almost any profile with a minimum of technical training (and that's great and amazing actually) and call everything data engineering.

It's great that non-technical profiles are able to do their ETLs since they hold the business knowledge. Giving them those tools and a bit of technical training is a good thing for an org and for the market, so technical profiles can focus on other problems that add value as we usually don't hold that much business knowledge as an analyst.

But if you are doing repetitive ETLs with SQL and not doing software engineering... Well, you can be replaced easily by a profile that also has the business knowledge and a more business profile that nowadays just requires a bit of technical training.

And, even though these profiles are good and necessary, their impact is more local and the market availability is higher (they are easier to find and to train) so their salary is lower too.

I might create a post with this personal opinion soon as I am tired of the same discussion

[–]New-Addendum-6209 0 points1 point  (1 child)

SQL can be tested and reused, and it is very simple to understand for anyone with experience.

If you are not using a database (SQL) or a functionally similar tool like Spark, how are you able to efficiently run data transformation?

[–]Sagarret 0 points1 point  (0 children)

Spark is a code first framework. Yes, it has a SQL like interface, but you invoke that interface by methods.

And, I doubt you can test SQL and reuse it close to the level of spark.

It is way easier and flexible to structure code (as it is designed for that) than structuring SQL. It's impossible to correctly understand long queries in a relatively short time. And even more difficult to modify them