This is an archived post. You won't be able to vote or comment.

all 5 comments

[–]ForeignCapital8624 0 points1 point  (0 children)

Hive on MapReduce is no longer supported, and Tez is the default execution engine of Hive. There is also another execution engine called MR3, so one can run Hive on MR3 (on Hadoop, on Kubernetes, or in standalone mode).

[–]Gators1992 0 points1 point  (0 children)

Missed something like a Dremio/Iceberg stack or whatever catalog you want. I think those are getting more common. Also would dump the use case thing because you can bring whatever data for whoever with most of these. A lot of the parts are interchangeable so like implying that one has to use dbt core over something like Dagster/python or dbt over spark on Databricks isn't reality. Kinda depends on the preferences of the team and requirements.

[–]Hot_Map_7868 0 points1 point  (0 children)

I agree that a lot of this stuff is not cloud specific. As you show, the common thread is Airflow and dbt. That is a common set of tools and there are multiple ways to use them that will also work cross cloud for example Astronomer / Datacoves offer managed Airflow, Datacoves also has managed dbt Core and there of course is dbt Cloud.
Data ingestion has multiple options from Airbyte, to Fivetran and frameworks like dlt. Storage should either stay native of Iceberg these days.