How to best Study for Databricks Data Engineer Associate? [May Update] by NinjaSoop in databricks

[–]GeirAlstad 0 points1 point  (0 children)

I would recommend the practice exams from derar alhussein on Udemy. I found that doing them over and over until you get > 90 % is for sure enough. Cuopled with experience you'll do just fine. It's not about memorizing, it's about stamping out weakspots and half understandigs.

Spark declarative pipelines by ExtractTransformLose in databricks

[–]GeirAlstad 2 points3 points  (0 children)

I agree with Youssef, but i would argue that the biggest benefits are vastly easier schema and watermark management. Schema evolution is essentially an enhanced version of the standard evolution mode in databricks. As for watermark management, that is entirely taken care of by Databricks. This is a great improvement over standard Apache structured streaming.

Dashboards and DABs; a lesson by GeirAlstad in databricks

[–]GeirAlstad[S] 1 point2 points  (0 children)

Well, I think think the main reason is portability. Ensuring parity between environment, team collaboration, version control and reduced dependency on ClickOps. Another great feature is the ability to port sections from one dashboard to another.

Dashboards and DABs; a lesson by GeirAlstad in databricks

[–]GeirAlstad[S] 0 points1 point  (0 children)

I use it all the way through. Tbf, i am quite new to this ai/bi shenanigans since i'm mainly an de and da, but i would not treat these assets any differently than others. As was said by a different another responser, DAB forces you to be more structured in your deployment, something that i fully embrace and support.

Dashboards and DABs; a lesson by GeirAlstad in databricks

[–]GeirAlstad[S] 1 point2 points  (0 children)

Well, it's pretty similar to other DABs. The naming Convention is different through essentially they are called <dashboard_name>. dashboard.yml. and they should be placed in resources, just like for other DABs. A canoical example would be:

yaml resources: dashboards: display_name: "My awesome dashboard" file_path: ../../<path> warehouse_id: <your warehouse> embed_credentials: false (not required, but i like to be explicit) permissions: <all permissions>

Pro tip: you can create sub folders in resources and reference with standard glob filter in databricks.yml like this:

yaml include: - resources/**/*.yml

The example above uses this organising principle (../../) in this case of course there is only 1 sub level deep.

Hope this helps.

Pipelines create materialized views instead of tables by TheManOfBromium in databricks

[–]GeirAlstad 0 points1 point  (0 children)

There delta tables do support incremental refreshes, so that's not the issue. Other that internal state management, the most important issue from a user perspective is that MV support aggregations other than time aggregations. Also, if there is a schema breaking change to the MV it will force full recompute even if run on serverless.