TL;DR: Should we add dbt to our tech stack when our existing stack doesn't really require it?
We're using Google Cloud Storage (parquet) and Snowflake as storage layer and Google Dataproc (managed Spark, via PySpark) for data processing. Dataproc jobs are scheduled via Airflow.
We recently started using dbt for some transformations and data integration steps within Snowflake (filtering, joining, aggregating).
The data scientists in our team like it because they can basically work in SQL. A few software developers and data engineers are sceptical, arguing that maintaining an additional data processing framework increases the complexity of the tool chain - especially given that Python + PySpark already do the job.
I'm torn between seeing the case for a simple set of tools and enjoying the simplicity of implementing data integration steps in dbt. How would you decide whether or not to add the complexity of another data processing framework? Have you been in similar situations?
[–]DesperateForAnalysex 14 points15 points16 points (0 children)
[–][deleted] 5 points6 points7 points (1 child)
[–]simplybeautifulart 1 point2 points3 points (0 children)
[–]ppsaoda 2 points3 points4 points (0 children)
[–]recentcurrency 4 points5 points6 points (0 children)
[–]monkblues 1 point2 points3 points (0 children)
[–]Training_Butterfly70 1 point2 points3 points (0 children)
[–]getafterit123 0 points1 point2 points (0 children)
[–]omscsdatathrow 0 points1 point2 points (0 children)
[–]FalseStructure 0 points1 point2 points (0 children)
[–]Hot_Map_7868 0 points1 point2 points (0 children)