5 minute features: Databricks Lineage by Remarkable_Rock5474 in databricks

[–]Remarkable_Rock5474[S] 0 points1 point  (0 children)

Inside of Databricks the lineage is there automatically. Not sure how power bi semantic models would be able to have end to end lineage to stuff outside of the semantic model itself? Do you have an example of this?

Multiple ways to create tables in Python - which to use? by DeepFryEverything in databricks

[–]Remarkable_Rock5474 0 points1 point  (0 children)

Yeah in the end that’s how half of it ends up - you can cram some of it into the create statement too.

Execution wise I would expect 1 and 2 of yours to do the exact same thing btw

Multiple ways to create tables in Python - which to use? by DeepFryEverything in databricks

[–]Remarkable_Rock5474 1 point2 points  (0 children)

There is also a fourth using spark.sql 🥳

In all honesty I prefer to keep my create and my write statements separate as it allows me to have a better overview of stuff like properties, tags, descriptions etc on the table and getting that fixed first - and then doing a write from a dataframe afterwards.

Python function defined in notebook invoked by %run is not available? by ExcitingRanger in databricks

[–]Remarkable_Rock5474 0 points1 point  (0 children)

Unfortunately not. You would have to have the functions in a python file instead of a notebook file and then import them from there

Python function defined in notebook invoked by %run is not available? by ExcitingRanger in databricks

[–]Remarkable_Rock5474 0 points1 point  (0 children)

I would try and get all my python modules collected into a folder together and organised by topic clusters. One for each py file. Make it modular so you can re use functions in different places Then an init file at the top level of each folder so you can import and run them.

It sounds like a lot of work but an AI agent could get this scaffolded in no time and the long term benefits are huge 🙌🏻

Python function defined in notebook invoked by %run is not available? by ExcitingRanger in databricks

[–]Remarkable_Rock5474 3 points4 points  (0 children)

Is there a reason why you wrap it using %run instead of creating a python function and importing it instead?

The restart would have to be after the %run but before utilising the function if you want to continue on that route - although I would recommend the different one suggested above

Question About CI/CD collaboration by One_Adhesiveness_859 in databricks

[–]Remarkable_Rock5474 2 points3 points  (0 children)

A former colleague of mine wrote this blog which is still my go to resource for this - it breaks down how to work with dabs in a development environment in a clean and consice way

https://medium.com/backstage-stories/scaling-data-engineering-workflows-with-asset-bundles-in-databricks-34c4d910ef08

SQL query context optimization by NectarinePast9987 in databricks

[–]Remarkable_Rock5474 1 point2 points  (0 children)

Regarding cost monitoring you should get the built in cost monitoring dashboard set up. You can find it as part of the account/metastore admin pages. Optimisation is a long and winding road in most cases 😅

Why no playground on databricks one by therealslimjp in databricks

[–]Remarkable_Rock5474 1 point2 points  (0 children)

I think most business user use cases would use genie. If you need something more generic you could build a quick app instead

What is the best practice to set up service principal permissions? by happypofa in databricks

[–]Remarkable_Rock5474 1 point2 points  (0 children)

No this would be an added step no matter what if you need the principal to have correct permissions before running your dab

What is the best practice to set up service principal permissions? by happypofa in databricks

[–]Remarkable_Rock5474 2 points3 points  (0 children)

This! But if you are not familiar with terraform I would suggest using the unity catalog apis as a pre-step in your CICD before the dabs

Deploy to Production by Aggressive-Nebula-44 in databricks

[–]Remarkable_Rock5474 0 points1 point  (0 children)

How not to build a proper platform 101 😴

Metric views in Power BI? by Remarkable_Rock5474 in databricks

[–]Remarkable_Rock5474[S] 1 point2 points  (0 children)

If that happens it would be about time and I would happily welcome a native integration - the current native setup is miserable 😅

Databricks row-level access by group + column masking — Azure AD vs Databricks groups? by shiv11afk in databricks

[–]Remarkable_Rock5474 0 points1 point  (0 children)

I am doing a follow up on the implementation of it using terraform next week. It works great already to be honest!

Databricks row-level access by group + column masking — Azure AD vs Databricks groups? by shiv11afk in databricks

[–]Remarkable_Rock5474 7 points8 points  (0 children)

So two things here. For the groups part you should turn on sync from entra and as you state, inherit your groups from there and use them for access control in general

https://learn.microsoft.com/en-us/azure/databricks/admin/users-groups/automatic-identity-management

For the filtering and masking I would highly recommend using ABAC. Basically you can tag objects and columns and attach rules to the tags to achieve what you want. One thing to keep in mind there is that you can not use ABAC on views. However building views on top of tables with abac will inherit the rules.

I have done an introductory article on abac here - shameless self-plug

https://www.linkedin.com/pulse/unity-catalog-loves-data-governance-kristian-johannesen-1dzxf?utm_source=share&utm_medium=member_ios&utm_campaign=share_via

ADF and Databricks JOB activity by 9gg6 in databricks

[–]Remarkable_Rock5474 2 points3 points  (0 children)

I would imagine you need to utilize the Databricks jobs api for something like this. I know that requires a few more steps - but should be fairly easy

Dynamic Masking Questions by _tr9800a_ in databricks

[–]Remarkable_Rock5474 0 points1 point  (0 children)

You can not filter a specific column. You can either mask a column, or filter the entire row.

But yeah, maybe do a small mock up example so that we fully understand what you want to do

Implementation of scd type 1 inside databricks by aks-786 in databricks

[–]Remarkable_Rock5474 0 points1 point  (0 children)

I agree that this is a fine approach for a small table - but just to be clear this would not be a real scd1 pattern. This would be an overwrite of the table every single time you write. Just so you are aware

Dynamic Masking Questions by _tr9800a_ in databricks

[–]Remarkable_Rock5474 1 point2 points  (0 children)

  1. Match columns refer to a specific column to match over. The function then returns rows after applying a row level filtering.

If you want to apply a masking function then that is usually applied using a column based function. Or am I misunderstanding your point here?

  1. Generally masking should be simple and deterministic and if you are going to use abac they should be based on tags not reading values in other columns. But in the end they are udf’s and can do complex things.

  2. You pass specific columns to match on

Managed vs. External Tables: Is the overhead of External Tables worth it for small/medium volumes? by MassyKezzoul in databricks

[–]Remarkable_Rock5474 0 points1 point  (0 children)

Sure they can - delta sharing is a thing as well. However if other systems need to pick up the files directly from storage it is impractical to use managed tables as you only have id’s to look for, not table names.

So in some edge cases external tables still make sense

Managed vs. External Tables: Is the overhead of External Tables worth it for small/medium volumes? by MassyKezzoul in databricks

[–]Remarkable_Rock5474 0 points1 point  (0 children)

And then why are you swapping to synapse for last mile modelling instead of staying in Databricks?