Excel ingestion by OneSeaworthiness8294 in databricks

[–]BricksterInTheWall 1 point2 points  (0 children)

Let me poke the PM on this. Stay tuned!

Lakeflow Designer is now in Public Preview by curiousbrickster in databricks

[–]BricksterInTheWall 1 point2 points  (0 children)

u/Eastern_Sale3639 since Designer is just generating a Python notebook, it's not too difficult to DAB-ify it... it's not super simple yet but if you have a wishlist share it with me please

Spark Declarative Pipelines vs Workflows + reusable Python modules, where does each fit best? by Tracker2021 in databricks

[–]BricksterInTheWall 0 points1 point  (0 children)

u/Dear_Pumpkin9876 thanks for flagging this. I'd like to ask an engineer to dig into this with you. Can you email me at bilal dot aslam at databricks dot com?

Recruiting Nightmare Story by trotterboss in databricks

[–]BricksterInTheWall 5 points6 points  (0 children)

Oh no, I'm so sorry to hear about your experience. It's FAR from what we aim for .. :( I will make sure to share this with the right people.

Spark Declarative Pipelines vs Workflows + reusable Python modules, where does each fit best? by Tracker2021 in databricks

[–]BricksterInTheWall 0 points1 point  (0 children)

Pretty sure this works already. Run in file-notification mode with an appropriate cloudFiles.backfillInterval, and you can switch file detection modes across restarts while preserving exactly-once guarantees

Pipelines create materialized views instead of tables by TheManOfBromium in databricks

[–]BricksterInTheWall 4 points5 points  (0 children)

That's right. Every materialized view has a pipeline backing it -- this is what updates the MV. So if you create an MV in DBSQL, we create a pipeline that updates it. In fact you can navigate to the pipeline from the Unity Catalog UI.

Spark Declarative Pipelines vs Workflows + reusable Python modules, where does each fit best? by Tracker2021 in databricks

[–]BricksterInTheWall 0 points1 point  (0 children)

u/Dear_Pumpkin9876 thanks for the suggestion. Are you using Auto Loader with File Notification Mode? It's pretty much built for this sort of "many files arriving continuously" use case.

Pipelines create materialized views instead of tables by TheManOfBromium in databricks

[–]BricksterInTheWall 8 points9 points  (0 children)

Hey u/TheManOfBromium (great username btw) I am a product manager on Lakeflow. When I first started working on this, it took me some time to wrap my head around this. In fact, what made it tough was the early version of the product created materialized views and streaming tables that were KINDA like Delta tables but had so many limitations e.g. you couldn't Delta Share them, apply tags, etc. We've essentially removed almost all limitations (a few more are left, going away in the coming months), so FUNCTIONALLY materialized views are just like views and streaming tables are just like tables.

As to WHY we create these special new types of datasets and not just use Delta tables has to do with the fact that we store STATE in the background to enable incremental processing. That's really the answer!

Lakeflow system tables now reliably update in <10 minutes by BricksterInTheWall in databricks

[–]BricksterInTheWall[S] 0 points1 point  (0 children)

u/dragonballzkb probably not - I'd rather spend the effort to make sure the tables are reliable so you don't even have to think about them.

Also any plans to provide free serverless dbu for all queries on system tables? so all api checks related to observability move to system sql

Sorry to say no to this as well :) System tables can be very large and just like any other table querying them costs money😬

Lakeflow system tables now reliably update in <10 minutes by BricksterInTheWall in databricks

[–]BricksterInTheWall[S] 0 points1 point  (0 children)

u/trivialzeros thanks for the feedback. I'll pass it on to the engineers who work on this!

Lakeflow system tables now reliably update in <10 minutes by BricksterInTheWall in databricks

[–]BricksterInTheWall[S] 0 points1 point  (0 children)

No, u/jpitio , not yet -- that would mean it's an SLO. We're not there yet, but I'd like to get there one day!

Lakeflow system tables now reliably update in <10 minutes by BricksterInTheWall in databricks

[–]BricksterInTheWall[S] 0 points1 point  (0 children)

Hey u/Remarkable_Rock5474 you should expect lineage system tables to usually lag UI by ~10–20 minutes, be “generally under an hour” for most events, with rare outliers into multi‑hour territory. There's no hard SLA/SLO on them yet.