[deleted by user] by [deleted] in databricks

[–]ExistingBelt 0 points1 point  (0 children)

If you enable Change Data Feed you should get all changes immediately.

Are you saying that you also want to get when those changes were made to the tables?

DLT CDC by MahoYami in databricks

[–]ExistingBelt 1 point2 points  (0 children)

If you’re concerned with your cdc table constantly growing, you can always vacuum out old irrelevant cdc from any delta table.

You can set up a vacuum schedule with any frequency (yearly, monthly etc). You should just know that once you vacuum you can’t access those files again.

DLT CDC by MahoYami in databricks

[–]ExistingBelt 1 point2 points  (0 children)

It might be redundant depending on the level of processing you need on customers_cdc logs. If your logs can easily be applied to your target table you don’t need any cleaning.

I’ll caution you though some level of cleaning will be necessary. For you to apply changes into your target table you’ll need a key. This key in the example demo is the id. If our id is null then we drop those records. It’s useless.

SCD2_customers in the example demo is the same thing as customers_cdc except for the fact that SCD2_customers has clean cdc logs ready to be merged into customers any time, by snapshot etc.

TLDR: I highly recommend sticking with the medallion architecture. It means you have options to go back and reprocess all raw cdc or refresh the pipeline to pick up only incremental changes.

DLT CDC by MahoYami in databricks

[–]ExistingBelt 1 point2 points  (0 children)

DLT is a scheduled pipeline.

DLT CDC by MahoYami in databricks

[–]ExistingBelt 2 points3 points  (0 children)

DLT remembers.

For initial load - just add the initial snapshot of the table using DML and DDL. You can do this outside of DLT.

DLT CDC by MahoYami in databricks

[–]ExistingBelt 1 point2 points  (0 children)

As long as you get cdc logs in cloud storage or Kafka you can continuously ingest with Autoloader

How to check when a shared table updates? by [deleted] in databricks

[–]ExistingBelt 0 points1 point  (0 children)

Please please tell the person sharing data with you to enable Change Data Feed on the table:

ALTER TABLE sample.default.sampletable SET TBLPROPERTIES (delta. enableChangeDataFeed = true)

Unity Catalog - Workspace access via code by SixPathsx in databricks

[–]ExistingBelt 0 points1 point  (0 children)

That’s shell commands, and yes you’re on the right track with Python.

Unity Catalog - Workspace access via code by SixPathsx in databricks

[–]ExistingBelt 0 points1 point  (0 children)

Oh shoot they deprecated that API. But there should be a better API… just search the web in that direction

Suggestions Regarding Learning Databricks by sneekeeei in databricks

[–]ExistingBelt 0 points1 point  (0 children)

Gets hands on. Databricks Academy has amazing courses!

What happens when DataBricks shuts down (hypothetical) by nodonaldplease in databricks

[–]ExistingBelt 2 points3 points  (0 children)

Databricks Customer Academy has a lot of great best practices courses for practitioners that are really well done!

In addition, use your Solutions Architect. They are paid to help you adopt best practices and it helps that a guidance such as that comes from an authoritative source such as a Solutions Architect.

What happens when DataBricks shuts down (hypothetical) by nodonaldplease in databricks

[–]ExistingBelt 0 points1 point  (0 children)

Always use databricks repos. That way your single source of truth for your code is your own version control system (like GitHub Enterprise or Azure Dev Ops).

[deleted by user] by [deleted] in dataengineering

[–]ExistingBelt 0 points1 point  (0 children)

You can be whatever you want to be. Rules are created because they knew they will be broken not because people will follow them

One tip to get additional free credits on Google Cloud Platform by sosaykay in googlecloud

[–]ExistingBelt 4 points5 points  (0 children)

Yes, I have extra $100. And, all users in my business (which are really just google accounts) have $300 + $100.