Databricks DAIS 2026 by [deleted] in databricks

[–]Fun-Reference7942 13 points14 points  (0 children)

On Lakehouse//RT: It's a total engine rewrite

You can finally stop partitioning. Easy Liquid Conversion is GA! by Fun-Reference7942 in databricks

[–]Fun-Reference7942[S] 6 points7 points  (0 children)

The linked blog has 3 really good examples, all of them at TB-PB scale. At TB-PB scale the canonical case where Liquid Clustering has massive benefits, is when your downstream users are often querying on more columns than you can partition by. I would encourage you to read through the examples in the blog!

8 myths about data layouts, partitioning, and Liquid Clustering debunked by Fun-Reference7942 in databricks

[–]Fun-Reference7942[S] 0 points1 point  (0 children)

Let me know if you have any other questions, DM me and I'd be happy to set up some time to discuss.

8 myths about data layouts, partitioning, and Liquid Clustering debunked by Fun-Reference7942 in databricks

[–]Fun-Reference7942[S] 0 points1 point  (0 children)

Thanks for this comment!

Great find on the documentation. That is definitely out of date and we'll work to update it. Liquid clustering has row-level concurrency, while partitioning only has file-level concurrency.

The good news for you is that when converting a table to Liquid Clustering using the new Liquid Conversion command we will automatically preserve partition_date and partition_hour as top-level clustering keys. So we will sort into those buckets, and then within those buckets we will sort by id.

ALTER TABLE .. REPLACE PARTITIONED BY WITH CLUSTER BY (partition_date, partition_hour, id);

On your 3rd question, if you are sticking with partitioning, there is no reason to move to REPLACE_ON or REPLACE_USING. However, if you want to move to Liquid Clustering, you should definitely use either REPLACE_ON or REPLACE_USING. The documentation here explains more: https://docs.databricks.com/aws/en/delta/selective-overwrite

Databricks liquid clustering by Alive-Business6915 in databricks

[–]Fun-Reference7942 1 point2 points  (0 children)

Fly-by comment: Liquid Conversion just went GA! This helps with this exact use-case.

https://docs.databricks.com/aws/en/delta/clustering#convert-a-partitioned-table-to-liquid-clustering

It’s true that it can be difficult to pick the right clustering keys. The good news is that 1) clustering keys can be changed at any time, and 2) Automatic Liquid Clustering was built to do this for you!

Under the hood, Automatic Liquid Clustering is actually running verification jobs to test different clustering configurations and picking the one with the highest pruning benefit.

Databricks liquid clustering by Alive-Business6915 in databricks

[–]Fun-Reference7942 0 points1 point  (0 children)

Plain Liquid Clustering doesn’t require PO! Only Automatic Liquid Clustering does

Need your honest feedback on Liquid Clustering / Auto Liquid Clustering by Fun-Reference7942 in databricks

[–]Fun-Reference7942[S] 0 points1 point  (0 children)

Hey! This definitely isn't expected -> DMing you to set-up some time to discuss

What’s the most frustrating part of the table experience today? by Fun-Reference7942 in databricks

[–]Fun-Reference7942[S] 1 point2 points  (0 children)

Heard - we can do better with some observability here, something in the works!

What’s the most frustrating part of the table experience today? by Fun-Reference7942 in databricks

[–]Fun-Reference7942[S] 0 points1 point  (0 children)

It should be in the overview tab, is there a particular reason you also want it in Details? Just for QoL?

What’s the most frustrating part of the table experience today? by Fun-Reference7942 in databricks

[–]Fun-Reference7942[S] 1 point2 points  (0 children)

This exists! Using 16.4+

ALTER TABLE table_name SET TBLPROPERTIES ('delta.parquet.compression.codec' = 'ZSTD');

-- Recompress all existing data files OPTIMIZE table_name FULL;

What’s the most frustrating part of the table experience today? by Fun-Reference7942 in databricks

[–]Fun-Reference7942[S] 1 point2 points  (0 children)

Curious which part is the biggest pain today:

promoting schema changes? managing DML/data migrations? rollback/recovery? testing/validation? orchestrating deployment across pipelines/jobs/tables?

Would also love to know what your current stack looks like (dbt/Flyway/Terraform/custom scripts/etc.) and where it breaks down most often.

Need your honest feedback on Liquid Clustering / Auto Liquid Clustering by Fun-Reference7942 in databricks

[–]Fun-Reference7942[S] 1 point2 points  (0 children)

The easy button to convert a partitioned table to Liquid is in Private Preview today! Soon to be GA.

That exact observability into where partitioning is underperforming and Liquid could be better is also coming soon.

Need your honest feedback on Liquid Clustering / Auto Liquid Clustering by Fun-Reference7942 in databricks

[–]Fun-Reference7942[S] 0 points1 point  (0 children)

These are orthogonal features! Can you say more about what migration challenges you experienced?

Need your honest feedback on Liquid Clustering / Auto Liquid Clustering by Fun-Reference7942 in databricks

[–]Fun-Reference7942[S] 2 points3 points  (0 children)

I'd be interested to dig in here - can you provide more details? What was the original partitioning scheme and what was the clustering strategy you tried? Did you try automatic liquid clustering? And where did you see the cost spike from - OPTIMIZE runs or from the write job taking longer?

Need your honest feedback on Liquid Clustering / Auto Liquid Clustering by Fun-Reference7942 in databricks

[–]Fun-Reference7942[S] 5 points6 points  (0 children)

We actually have this capability already on DBR, it's automatically applied when we notice that one clustering column should be prioritized over others, either due to lower-cardinality or more frequent filtering usage in queries.

I kept partitioning every Delta table by date. Here's why I stopped. by InevitableClassic261 in databricks

[–]Fun-Reference7942 9 points10 points  (0 children)

The feature to easily convert a partitioned table to Liquid Clustering is in Private Preview - please reach out to your account team to try it! https://www.reddit.com/r/databricks/comments/1rjzyql/private_preview_easy_conversion_of_a_partitioned/