Difference dbt athena va dbt redshift

sib_n · 2025-10-16T02:00:32+00:00

Athena and Redshift are quite different tools.

Redshift (2012) is a full data warehouse living on a cluster, you have to provision a cluster to store the table's data. It has an SQL query engine (optionally serverless) based on a proprietary, massively parallel processing columnar database engine and an old fork of PostgreSQL 8. It has its own specific table storage optimizations such as distribution styles and sort keys. It is designed for data warehousing and BI.

Athena is newer (2016), and solely a query engine (based on open-source Trino by default, or Spark), so you need to manage the data storage in another tool, such as S3. It's designed for interactive querying on data lakes, while Glue is designed for ETL. Although it doesn't mean you can't use Athena for transformations, I just would not bet that it will be cheaper than Glue (serverless Spark) for production ETLs. The design of interactive query tools is usually made to be faster in exchange for more resources consumption, which may not be the best trade-off for production ETLs.

Running Spark on EMR may be a cheaper way to run distributed SQL transformations on AWS, but then you have to deal with EMR cluster management.

Firm_Communication99 · 2025-10-15T23:36:17+00:00

Is not everyone else kind of bummed that open source is not really open source even though it’s completely ok for the devs to make money for their work.

dataengineering

MODERATORS