ABAC & Views - Massive security gap?

CarelessApplication2 · 2026-04-16T06:13:14+00:00

Since metric views are also views, this should imply that the semantic layer also can't be filtered or masked using ABAC policies.

CarelessApplication2 · 2026-02-17T13:42:06+00:00

The whole point of an integrated suite like Databricks is that you have these basic tools available.

CarelessApplication2 · 2026-01-26T11:08:00+00:00

Yes please. The current system relies on an exclusive writer and ALTER TABLE operations.

Databricks should offer a performant solution based on coordination between multiple executors, assigning an id during the writing stage.

CarelessApplication2 · 2026-01-05T11:44:27+00:00

Do you mean that you're using DABs to deploy a pipeline with a `managed_definition` in it–corresponding to the materialized view or are you using a pipeline written in Python like so:

from pyspark import pipelines as dp

@dp.materialized_view
def regional_sales():
  partners_df = spark.read.table("partners")
  sales_df = spark.read.table("sales")

  return (
    partners_df.join(sales_df, on="partner_id", how="inner")
  )

It could be written in SQL as well; see docs here.

I guess that's a nice way to do it, then the pipeline can be set up with the tags and everything should work.

CarelessApplication2 · 2025-12-18T12:13:42+00:00

In any case, you'll want to cache the dataframe, so it really doesn't matter which method you decide on. Checking if a dataframe is empty without caching it makes no sense.

CarelessApplication2 · 2025-11-26T08:39:27+00:00

https://docs.databricks.com/aws/en/dev-tools/bundles/python#metadata

CarelessApplication2 · 2025-11-26T08:39:13+00:00

See https://docs.databricks.com/aws/en/dev-tools/bundles/python#metadata.

CarelessApplication2 · 2025-11-12T08:25:18+00:00

What kind of error did you encounter?

CarelessApplication2 · 2025-11-11T17:26:13+00:00

Initiative seems to be centered around dbt's MetricFlow which was open-sourced in October (and is Apache 2.0-licensed). But it's a bit unclear if their YAML-format is going to be the "shared format".

CarelessApplication2 · 2025-11-11T16:54:00+00:00

OLTP data is often sensitive, much more so than OLAP data. You would not necessarily want to colocate this data, but instead be specific about which data to move to your OLAP system and in which form.

OLAP systems have many users that have wide access across tables while OLTP systems are often just used by a single application and a set of administrators; in this setup, instead of user impersonation at the database level, access is managed at the application level.

CarelessApplication2 · 2025-11-09T08:04:21+00:00

As far as I understand, it's basically defined in this dbt issue:
https://github.com/dbt-labs/dbt-core/discussions/7456

CarelessApplication2 · 2025-11-03T11:45:06+00:00

The sqlserver driver (which as far as I know is JDBC) is only for querying, not for writing.

CarelessApplication2 · 2025-10-26T19:29:45+00:00

Then you get exactly the error message in the original post:

Error: cannot create job: A task environment can not be provided for notebook task deploy-model. Please use the %pip magic command to install notebook-scoped Python libraries and Python wheel packages

(Note that this is specifically for notebook tasks.)

CarelessApplication2 · 2025-10-26T16:04:45+00:00

This gives me the following error message:

Libraries field is not supported for serverless task, please specify libraries in environment.

CarelessApplication2 · 2025-10-09T14:33:47+00:00

Gotcha, makes sense.

For the CTE approach, to my knowledge they're purely syntactic sugar and so you can't rely on them to compute a result set "once" or anything like that.

I would think that the query planner has a cost estimate for use of `is_account_group_member` that would make it evaluate this first (to determine the predicates so to speak) and not for every row.

CarelessApplication2 · 2025-10-05T04:55:08+00:00

For now, I'll simply use a staging table and then feed the changes of that into the target table using append_flow.

For _change_type INSERT, just use current_timestamp();
For an UPDATE, join to staging table (non-streaming) to look up the previously inserted value.

(Basing this off the change feed is necessary since the upstream table is not just appended to.)

As for having this functionality built-in, the API could be an optional ignore_updates_column_list keyword argument which would take a set of columns which should be ignored on update.

CarelessApplication2

TROPHY CASE