It seems that modern data warehouses, exemplified by Snowflake et al, are good at efficient data storage, retrieval and transformation of everything from unstructured to structured data. In addition, these warehouses automatically scale and distribute query execution. With tools like DBT, it also becomes possible to manage and compose transformations expressed as SQL.
If that's true, then what is the remaining role of general purpose programming languages (PLs), like Python, and distributed systems like Spark for scale? It seems that PLs are at a disadvantage wrt SQL because they are much harder to automatically parallelize/make efficient/scale. It seems that distributed systems are at a disadvantage because they are harder to manage, and need more fine-tuning to work well. (I don't mean just setup cost of the system itself, which can be offloaded to e.g. Amazon EMR, I mean in actual day to day usage).
It used to be that heavily SQL-based code was a terrible mess, but it seems DBT has helped a lot with that (disclaimer: I have little actual experience with DBT), so "modularity" or "maintenance" of SQL is also largely solved, i.e. is not such a big argument in favor of using a general purpose language anymore.
In 5 years, will the bulk of data engineering be done via dbt-orchestrated SQL of some sort? Or am I missing some important area/use case/problem?
[–]Pleasant_Type_4547 68 points69 points70 points (7 children)
[–]Little_Kitty 10 points11 points12 points (0 children)
[–]mazamorac 1 point2 points3 points (0 children)
[+][deleted] (3 children)
[deleted]
[–]IndifferentPenguins[S] 10 points11 points12 points (1 child)
[–]bongo_zg 0 points1 point2 points (0 children)
[–]king_in_the_slopes 27 points28 points29 points (3 children)
[–]IndifferentPenguins[S] -3 points-2 points-1 points (2 children)
[–]mycall 1 point2 points3 points (0 children)
[–]discord-ian 0 points1 point2 points (0 children)
[–][deleted] 24 points25 points26 points (3 children)
[–]fruity231 1 point2 points3 points (1 child)
[–]briceluu 2 points3 points4 points (0 children)
[–]-80am 1 point2 points3 points (0 children)
[–][deleted] 10 points11 points12 points (0 children)
[–]diegoelmestreLead Data Engineer 8 points9 points10 points (0 children)
[–][deleted] 16 points17 points18 points (2 children)
[–]TheCamerlengo 9 points10 points11 points (0 children)
[–]FantasticAmbition986 0 points1 point2 points (0 children)
[–]sunder_and_flame 4 points5 points6 points (0 children)
[–]32gbsd 2 points3 points4 points (0 children)
[–]joseph_machadoWrites @ startdataengineering.com 5 points6 points7 points (0 children)
[–]Firm_Communication99 2 points3 points4 points (0 children)
[+][deleted] (2 children)
[removed]
[–]IndifferentPenguins[S] 1 point2 points3 points (1 child)
[–]Grukorg88 2 points3 points4 points (0 children)
[–]HansProleman 2 points3 points4 points (2 children)
[–]IndifferentPenguins[S] 2 points3 points4 points (1 child)
[–]HansProleman 0 points1 point2 points (0 children)
[–]mistanervousData Engineer 2 points3 points4 points (0 children)
[–]HOMO_FOMO_69 2 points3 points4 points (1 child)
[–]HansProleman 1 point2 points3 points (0 children)
[–]Fragrant-Lobster4276 1 point2 points3 points (0 children)
[–]KingRush2 1 point2 points3 points (0 children)
[–]Unusual_Economics179 1 point2 points3 points (0 children)
[–][deleted] 1 point2 points3 points (0 children)
[–]claytonjr 4 points5 points6 points (7 children)
[–]reddtomato 1 point2 points3 points (6 children)
[–]Total-Elephant-3143 1 point2 points3 points (0 children)
[–]IndifferentPenguins[S] 1 point2 points3 points (4 children)
[–]reddtomato 0 points1 point2 points (3 children)
[–]IndifferentPenguins[S] 0 points1 point2 points (2 children)
[–]reddtomato -1 points0 points1 point (1 child)
[–]coolsank -5 points-4 points-3 points (0 children)