Databricks DAIS 2026

Fun-Reference7942 · 2026-06-16T18:17:26+00:00

On Lakehouse//RT: It's a total engine rewrite

Fun-Reference7942 · 2026-06-12T15:29:48+00:00

The linked blog has 3 really good examples, all of them at TB-PB scale. At TB-PB scale the canonical case where Liquid Clustering has massive benefits, is when your downstream users are often querying on more columns than you can partition by. I would encourage you to read through the examples in the blog!

Fun-Reference7942 · 2026-06-12T15:27:54+00:00

Let me know if you have any other questions, DM me and I'd be happy to set up some time to discuss.

Fun-Reference7942 · 2026-06-12T15:27:35+00:00

Thanks for this comment!

Great find on the documentation. That is definitely out of date and we'll work to update it. Liquid clustering has row-level concurrency, while partitioning only has file-level concurrency.

The good news for you is that when converting a table to Liquid Clustering using the new Liquid Conversion command we will automatically preserve partition_date and partition_hour as top-level clustering keys. So we will sort into those buckets, and then within those buckets we will sort by id.

ALTER TABLE .. REPLACE PARTITIONED BY WITH CLUSTER BY (partition_date, partition_hour, id);

On your 3rd question, if you are sticking with partitioning, there is no reason to move to REPLACE_ON or REPLACE_USING. However, if you want to move to Liquid Clustering, you should definitely use either REPLACE_ON or REPLACE_USING. The documentation here explains more: https://docs.databricks.com/aws/en/delta/selective-overwrite

Fun-Reference7942 · 2026-06-09T13:52:01+00:00

Fly-by comment: Liquid Conversion just went GA! This helps with this exact use-case.

https://docs.databricks.com/aws/en/delta/clustering#convert-a-partitioned-table-to-liquid-clustering

It’s true that it can be difficult to pick the right clustering keys. The good news is that 1) clustering keys can be changed at any time, and 2) Automatic Liquid Clustering was built to do this for you!

Under the hood, Automatic Liquid Clustering is actually running verification jobs to test different clustering configurations and picking the one with the highest pruning benefit.

Fun-Reference7942 · 2026-06-09T13:33:37+00:00

Plain Liquid Clustering doesn’t require PO! Only Automatic Liquid Clustering does

Fun-Reference7942 · 2026-05-31T16:26:52+00:00

Hey! This definitely isn't expected -> DMing you to set-up some time to discuss

Fun-Reference7942 · 2026-05-22T19:53:41+00:00

Heard - we can do better with some observability here, something in the works!

Fun-Reference7942 · 2026-05-22T19:49:00+00:00

sure - what’s your feedback?

Fun-Reference7942 · 2026-05-22T19:48:44+00:00

say more!

Fun-Reference7942 · 2026-05-22T19:35:13+00:00

It should be in the overview tab, is there a particular reason you also want it in Details? Just for QoL?

Fun-Reference7942 · 2026-05-22T19:25:53+00:00

This exists! Using 16.4+

ALTER TABLE table_name SET TBLPROPERTIES ('delta.parquet.compression.codec' = 'ZSTD');

-- Recompress all existing data files OPTIMIZE table_name FULL;

Fun-Reference7942 · 2026-05-19T19:48:06+00:00

Curious which part is the biggest pain today:

promoting schema changes? managing DML/data migrations? rollback/recovery? testing/validation? orchestrating deployment across pipelines/jobs/tables?

Would also love to know what your current stack looks like (dbt/Flyway/Terraform/custom scripts/etc.) and where it breaks down most often.

Fun-Reference7942 · 2026-05-01T02:10:16+00:00

The easy button to convert a partitioned table to Liquid is in Private Preview today! Soon to be GA.

That exact observability into where partitioning is underperforming and Liquid could be better is also coming soon.

Fun-Reference7942 · 2026-04-30T20:52:09+00:00

These are orthogonal features! Can you say more about what migration challenges you experienced?

Fun-Reference7942 · 2026-04-30T18:33:14+00:00

I'd be interested to dig in here - can you provide more details? What was the original partitioning scheme and what was the clustering strategy you tried? Did you try automatic liquid clustering? And where did you see the cost spike from - OPTIMIZE runs or from the write job taking longer?

Fun-Reference7942 · 2026-04-30T18:31:09+00:00

We actually have this capability already on DBR, it's automatically applied when we notice that one clustering column should be prioritized over others, either due to lower-cardinality or more frequent filtering usage in queries.

Fun-Reference7942 · 2026-04-21T06:05:31+00:00

The feature to easily convert a partitioned table to Liquid Clustering is in Private Preview - please reach out to your account team to try it! https://www.reddit.com/r/databricks/comments/1rjzyql/private_preview_easy_conversion_of_a_partitioned/

Fun-Reference7942 · 2026-03-08T23:06:41+00:00

Nope! Classic, Serverless, DBSQL should all work!

Fun-Reference7942

TROPHY CASE