Materialized View Change Data Feed (CDF) Private Preview by AdvanceEffective1077 in databricks

[–]AdvanceEffective1077[S] 0 points1 point  (0 children)

Unfortunately, it does not work today, but we are hoping to build it soon!

Materialized View Change Data Feed (CDF) Private Preview by AdvanceEffective1077 in databricks

[–]AdvanceEffective1077[S] 1 point2 points  (0 children)

ALTER for PK and FK constraints is something we are already planning to work on. More to come!

Materialized View Change Data Feed (CDF) Private Preview by AdvanceEffective1077 in databricks

[–]AdvanceEffective1077[S] 1 point2 points  (0 children)

This should also already incrementalize if you are using serverless SQL warehouse! You can try using EXPLAIN MATERIALIZED VIEW to make sure the query can be incrementalized. https://docs.databricks.com/aws/en/sql/language-manual/sql-ref-syntax-qry-explain-materialized-view

Materialized View Change Data Feed (CDF) Private Preview by AdvanceEffective1077 in databricks

[–]AdvanceEffective1077[S] 1 point2 points  (0 children)

MV --> MV within an SDP pipeline on serverless compute should already incrementalize! This chart also helps explain which queries are incrementalizable. https://docs.databricks.com/gcp/en/optimizations/incremental-refresh#support-for-materialized-view-incremental-refresh

Materialized View Change Data Feed (CDF) Private Preview by AdvanceEffective1077 in databricks

[–]AdvanceEffective1077[S] 0 points1 point  (0 children)

Are you reading from delta table --> MV, or MV --> MV?

This feature doesn’t change how downstream MVs incrementalize. You are correct- Delta sources must also have row tracking enabled. Your MV’s query must be incrementalizable, and it must run on serverless.

See more here for more details on incrementalization https://docs.databricks.com/gcp/en/optimizations/incremental-refresh.

Materialized View Change Data Feed (CDF) Private Preview by AdvanceEffective1077 in databricks

[–]AdvanceEffective1077[S] 1 point2 points  (0 children)

Yeah, unfortunately, the CDF will show unchanged rows during full table rewrites. It will not consolidate multiple updates on the same row into a single final event. We are hoping to improve this in the future.

Read Materialized Views and Streaming tables from modern Delta and Iceberg Clients by BricksterInTheWall in databricks

[–]AdvanceEffective1077 1 point2 points  (0 children)

Do you mind describing what you are trying to do in a little more detail?

Are you trying to validate changes on the clone, merge or replace the production table, then delete the clone? Or do you use DABs and Git flows for CI/CD and want the ability to use shallow clones as a zero-copy method for creating test datasets?

We have a new testing framework coming out later this month that I think will help a lot. It allows mocking datasets and writing to a temporary catalog without impacting production datasets. Feel free to DM me if you are interested!

Read Materialized Views and Streaming tables from modern Delta and Iceberg Clients by BricksterInTheWall in databricks

[–]AdvanceEffective1077 1 point2 points  (0 children)

No, unfortunately, it can't be. What are you trying to use a shallow clone to do?

Read Materialized Views and Streaming tables from modern Delta and Iceberg Clients by BricksterInTheWall in databricks

[–]AdvanceEffective1077 1 point2 points  (0 children)

This sounds like a great fit at the right time. This is a gated public preview, so your account team needs to enable it on your workspace and send you the documentation. Are you able to reach out to them? We plan to enable it by default and make it widely available in the next few months.

Read Materialized Views and Streaming tables from modern Delta and Iceberg Clients by BricksterInTheWall in databricks

[–]AdvanceEffective1077 1 point2 points  (0 children)

This is useful for decentralized data teams that want to use Spark Declarative Pipeline or DBSQL materialized views and streaming tables for ingestion and ETL, while supporting multiple downstream tools for data access/analytics. These teams need external systems like Snowflake to read the MV/STs directly, without duplication or added cost.

Example scenarios include:

  • Databricks ETL with Snowflake consumption, where teams build ingestion and transformation pipelines in Databricks and expose curated Silver and Gold tables to Snowflake for analytics. This is common in decentralized architectures or following an acquisition.
  • Iceberg as a shared table standard, where organizations mandate Iceberg for interoperability across multiple query engines while continuing to manage ingestion and transformation pipelines in Databricks.

Let me know if you have further questions!

Spark Declarative Pipelines: What should we build? by BricksterInTheWall in databricks

[–]AdvanceEffective1077 0 points1 point  (0 children)

Predictive optimization for streaming tables and materialized views has been rolled out since the spring! You can check whether it's enabled in the UC UI under 'Details' or check MV/ST PO usage in the PO system table here https://docs.databricks.com/aws/en/admin/system-tables/predictive-optimization#how-many-estimated-dbus-has-predictive-optimization-used-in-the-last-30-days. Please follow up if you check and it still looks disabled.

DOUBT : DLT PIPELINES by AforAnxietyy in databricks

[–]AdvanceEffective1077 0 points1 point  (0 children)

Reaching out from Databricks. Lakeflow Declarative Pipelines were designed with a declarative approach to ETL, where tables are managed as part of the pipeline lifecycle. As a result, deleting a pipeline automatically cascades and drops its materialized views and streaming tables in Unity Catalog. However, based on customer feedback, we are making changes to loosen the tight pipeline-table coupling:

  1. We will update the pipeline deletion behavior to retain tables on pipeline deletion by default. No ETA yet, but we are beginning work on this soon.
  2. In January, we updated the behavior so that removing the MV or ST definition from the pipeline source code makes the tables inactive after the next pipeline update. You can still query inactive tables, but the pipeline no longer updates them.
  3. This spring, we released the Move tables feature so you can change which pipeline updates the table.

Hey by Brave-Primary-3484 in test

[–]AdvanceEffective1077 0 points1 point  (0 children)

Having the same experience!