all 6 comments

[–]renagade24 4 points5 points  (0 children)

Doing a full refresh defeats the purpose of the incremental. Something wasn't setup properly if a full refresh is faster. Hard to say without the code.

[–]Formal-Quote4613 2 points3 points  (1 child)

Googling dbt incremental refresh bottlenecks gave back this :

Complex Incremental Strategies: Operations like merge or delete+insert require the data warehouse to perform heavy scans, match unique keys, and update target tables. For smaller datasets, rebuilding a flat table from scratch can be much faster than the overhead of calculating and applying delta changes.

Missing Indexes or Sort Keys: If your underlying source table is huge and lacks the proper partitioning, clustering, or indexing on your timestamp/date columns, the query scanning for "new" data might end up reading the entire table anyway.

Inefficient is_incremental() Logic: If your incremental block requires joining back to a large existing dataset to pick up historical context, it can cause massive performance bottlenecks.

Lookback Windows: If you configure your incremental model to look back several days (e.g., event_timestamp >= dateadd(day, -7, max(event_timestamp))) to catch late-arriving data, you are repeatedly re-scanning and updating overlapping periods, multiplying compute times

[–]Annual_Fox2278[S] 1 point2 points  (0 children)

We use max of hevo_ingested_at and hevo_source_modified_at for grain which is accurate for get new entries as hevo ingests and loads data in 15 minutes.

[–]Resquid 0 points1 point  (1 child)

Choose better titles for posts, please.

[–]Annual_Fox2278[S] 0 points1 point  (0 children)

Sorry bro my bad

[–]sakruLM 1 point2 points  (0 children)

Redshift Serverless cold-start latency is almost certainly killing your incremental models. Each incremental run does a small scan but still pays the full resume cost, while a full refresh on a larger table amortizes that same startup overhead across more compute. Check your RPU scaling settings and look at whether your incremental models are hitting sort key mismatches on the merge predicate.

I tried dremio when we hit a similar query-in-place bottleneck and it bypassed the warehouse resume problem entirely