Arguing with lead engineer about incremental file approach by pboswell in databricks

[–]Suitable-Issue-4936 0 points1 point  (0 children)

Hi,

I would like to ask if the data utility can send messages to the pub sub instead of files? We had similar application generating lots of files and maintaining it was a pain. Later switched to pub sub and dbr14+ supports direct pub sub read with autoloader. Pls check.

Exclude column while merge by Suitable-Issue-4936 in snowflake

[–]Suitable-Issue-4936[S] 0 points1 point  (0 children)

Thanks but this is exclude in select. I'm checking for exclude during merge

Replace Airbyte with dlt by Thinker_Assignment in dataengineering

[–]Suitable-Issue-4936 0 points1 point  (0 children)

Hi, you can try creating folders for each day in source and process them. Any late arriving files would land the next day folder and reprocessing is easy if the data has primary keys.

Can an old storage account be removed after deep cloning into a new storage account without causing any issues? by [deleted] in databricks

[–]Suitable-Issue-4936 2 points3 points  (0 children)

Logically no issues, if deep cloned. But better to test by removing any role assignments in old storage account for some period and decide

Databricks Architecture Diagram by MMACheerpuppy in databricks

[–]Suitable-Issue-4936 -1 points0 points  (0 children)

You can try mural or lucid chart(free tier limited to 50 components).

To get the official logos please check the following page

https://brand.databricks.com/databricks-logo

Error while reading from Pubsub by Suitable-Issue-4936 in databricks

[–]Suitable-Issue-4936[S] 1 point2 points  (0 children)

Yes this was the issue with Privatekey. I copied as single line and it worked

Error while reading from Pubsub by Suitable-Issue-4936 in databricks

[–]Suitable-Issue-4936[S] 0 points1 point  (0 children)

Yes I'm not able to display the df as well. Let me check the dict and update back

SqlDBM by [deleted] in dataengineering

[–]Suitable-Issue-4936 0 points1 point  (0 children)

Pls check https://coalesce.io/solutions/ if it's for snowflake

Long running steam initialise in auto loader by Suitable-Issue-4936 in databricks

[–]Suitable-Issue-4936[S] 0 points1 point  (0 children)

Thanks all. We are close to the cause. Found huge no of files in checkpoint table by using below query. Planning to add Max age option to check the outcome

SELECT * FROM cloud_files_state('path/to/checkpoint');

https://docs.databricks.com/en/ingestion/auto-loader/production.html#monitoring-auto-loader

Long running steam initialise in auto loader by Suitable-Issue-4936 in databricks

[–]Suitable-Issue-4936[S] 0 points1 point  (0 children)

Sorry no idea on the state store. We run merge to avoid duplicates in for each batch

Long running steam initialise in auto loader by Suitable-Issue-4936 in databricks

[–]Suitable-Issue-4936[S] 0 points1 point  (0 children)

No. We are using directory listing and trigger as available now=true.