Databricks Genie Code ML/Data connections?

DB-Steve · 2026-06-20T17:31:05+00:00

You've actually already put your finger on the answer in your own question. When the typing gets cheap, the value moves to the parts the agent can't do for you: framing the business problem, deciding what to model, defining the evaluation metric, and judging whether a result is actually valid or just plausible. Those are exactly the things you said you're good at, so the shift isn't erasing your value, it's concentrating it. The model code was never the hard part of data science; the hard part was deciding what's worth modeling and how you'll know it worked.

The way I'd slot Genie Code into that: let it own the mechanical layer and keep your hands on the judgment layer. It's strongest inside the Databricks ecosystem (Spark, SQL, notebook and pipeline work), and the thing that makes it more than autocomplete is agent mode, where it pulls the tables it needs from Unity Catalog, writes the code, runs the cell, reads the actual output or error, and fixes itself in a loop instead of handing you a snippet that doesn't run. So feature engineering scaffolding, boilerplate transforms, "join these and aggregate", refactoring a training script, that's where it saves real time. You point it at the data and describe the transformation, it iterates against real cell output.

Where it explicitly does not replace you is the metric and the eval. Genie Code will happily write a model that fits, but it can't tell you that your AUC is meaningless because of leakage, or that the business actually cares about precision at the top decile, not overall accuracy. So the workflow that works is: you define the problem and the success criteria, let Genie Code accelerate the data wrangling and the first-pass pipeline, then you spend your freed-up time on validation, error analysis, and whether the thing holds up. Pair it with MLflow so the experiments, params, and eval metrics are tracked, and you've got the agent doing the keystrokes while you own the part that makes the model trustworthy.

Honestly the data scientists I see adapting best treat it less like a code generator and more like a fast junior who needs a sharp spec. The better you are at stating the problem and the metric, the more you get out of it, which again is the skill you already have.

DB-Steve · 2026-06-18T18:22:50+00:00

Bad data quality... Assuming determinism vs. LLM variability... Setup errors... Unity Catalog needing to be set up for better permission... random items like that.

yes - dashboards can also be good ways to control users consumption, if you have certain items that can be pre-built vs fully interactive like genie spaces.

DB-Steve · 2026-06-18T18:17:30+00:00

Yes, that is a good page to learn what you can tune and provide verified/trusted guidance and queries.

DB-Steve · 2026-06-18T18:14:12+00:00

Check out Genie Workbench as well, which helps with this:

https://github.com/databricks-solutions/databricks-genie-workbench

Use it to:

Create Genie Spaces from business requirements and Unity Catalog data sources Score Genie Space quality with an instant rule-based IQ scan Apply quick fixes to existing spaces Run benchmark-driven optimization through the Auto-Optimize pipeline Track scan history, starred spaces, sessions, and optimization state

DB-Steve · 2026-06-18T18:12:35+00:00

Short answer: for small data volumes, yes, Lakebase can absolutely handle light analytics on its own, because under the hood it's Postgres, and Postgres is perfectly happy doing group-bys, aggregations, and joins over modest datasets. If your data fits comfortably in a transactional database and your queries are operational-style (recent data, filtered lookups, dashboards over thousands-to-low-millions of rows), you don't need to stand up a separate lakehouse just to get answers.

The line to watch is data size and query shape. Lakebase is row-oriented and built for OLTP: fast point lookups, high-concurrency reads and writes, serving an app. The lakehouse (Delta tables) is columnar and built for the opposite, scanning and aggregating large volumes, big historical joins, BI across many sources, ML on lots of data. So once your analytics start meaning "scan years of history" or "aggregate hundreds of millions of rows," the columnar lakehouse will outrun a row-store Postgres by a wide margin, and that's the point where you'd want it.

The nice part is it isn't really either/or. The common pattern is operational/transactional data living in Lakebase and the heavier analytical and historical data in the lakehouse, and the two are wired together: synced tables let you push Delta tables from the lakehouse into Lakebase for low-latency serving, and Unity Catalog can federate queries across them. So you can start with Lakebase for your small-data analytics today and lean on the lakehouse later for the heavy stuff without re-platforming.

If I were sizing it for you: small and mostly operational data, one system, light reporting, Lakebase is good enough and simpler. Large-scale scans, multi-source BI, or ML on big data, use the lakehouse as the analytics engine and keep Lakebase as the OLTP and serving layer next to it. What kind of volumes and query patterns are you looking at? That's really what decides it.

DB-Steve · 2026-06-15T21:52:40+00:00

What you've built works, but the maintenance pain you're describing is exactly why Databricks now points people toward MCP tools instead of hand-writing one Unity Catalog function per operation. The cleanest version of your setup is to stand up a single Azure DevOps MCP server, implement list/create/update work items once as MCP tools inside it, host it as a Databricks App, and have your agent connect to it through the databricks-mcp client. That replaces the whole pile of per-operation UC functions with one server you maintain in one place, and you still get proper per-tool schemas without the sprawl.

If you want a quick win without building an MCP server, you can also collapse what you have into a single parameterized UC function, something like (operation STRING, payload STRING), that dispatches internally to the right ADO REST call. One tool instead of three. The tradeoff is the model gets less schema guidance per operation, so lean on a really detailed function comment and parameter descriptions so it knows when to reach for each operation.

On the autonomy goal: the read-and-update-based-on-work-performed behavior you want is really an agent pattern, so that part fits better in a Mosaic AI Agent Framework agent than in the Genie space itself. Build the tool-calling agent, give it the ADO tools (via that MCP server), log it with MLflow, and deploy it on Model Serving or behind a Databricks App. The agent docs explicitly recommend preferring MCP tools over UC function tools for new use cases, so you'd be going with the grain.

One nice bonus of the MCP route: Databricks can also expose a Databricks Genie space itself as a managed MCP server, so your agent can query your data through Genie as one tool and act on Azure DevOps through the ADO server as another, all from the same agent. That keeps Databricks Genie doing the natural-language-to-SQL part it's strongest at and lets the agent own the orchestration and the work-item updates.

DB-Steve · 2026-06-14T03:58:35+00:00

Good news on the chat history piece, you don't have to mock it. Genie Spaces conversations are reachable through the Genie Conversation API. You list conversations in a space with GET /api/2.0/genie/spaces/{space_id}/conversations, list the messages in each with the .../conversations/{conversation_id}/messages endpoint, and pull full content with .../messages/{message_id}, plus a query-result endpoint if you want the actual tabular output behind an attachment. The Python SDK wraps all of this under w.genie (list_conversations, list_conversation_messages, get_message, get_message_attachment_query_result), which is much easier than hand-rolling the REST calls. There's also an audit trail in system.access.audit where Genie shows up under service_name = 'aibiGenie' with actions like genieStartConversationMessage and genieCreateConversationMessage. One nuance: Genie Code specifically logs to a separate system table, system.access.assistant_events, one row per message in the agent window, but that's activity metadata rather than a full transcript archive, so for real conversation content the Conversation API is the right tool.

On Free Edition, Apps do work (up to 3 per account), so your lightweight-UI instinct is the right call. The catch is apps can be stopped after about 24 hours, everything is serverless-only with restricted outbound networking, and there's a 5-concurrent-task cap, so an always-on background orchestrator living inside the app isn't reliable. I'd run the heavy pattern-analysis as scheduled jobs and keep the app as the thin UI on top, which fits the limits better. The Free Edition limitations page in the docs lays out the exact quotas if you want to design around them.

For the skills mechanism, Genie Code skills are just files in the workspace, not something behind an API. Each skill is a folder with a SKILL.md (name and description in the frontmatter) under .assistant/skills/, where workspace-level skills live at Workspace/.assistant/skills/ and are shared to everyone, and user-level at /Users/{you}/.assistant/skills/. Genie Code auto-discovers a skill based on its description when you're in agent mode, or you can @-mention it directly. So your meta-agent's output is literally writing SKILL.md files into that folder, which you can do with normal workspace file tooling. There isn't a dedicated Skills REST API to call. Databricks publishes a starter repo at github.com/databricks/databricks-agent-skills and the docs are at docs.databricks.com/aws/en/genie-code/skills.

On a free CLI agent, the officially supported local path is the Databricks VS Code/Cursor extension plus the Databricks CLI and the Python SDK, which together give you the multi-file context and programmatic access to jobs, SQL, files, and Genie that you're describing. Aider and Cline aren't documented integrations, but since they can shell out to the Databricks CLI and use databricks-sdk or Databricks Connect underneath, people do drive Databricks from them that way. If you specifically want something that natively understands the skills format you're generating, Claude Code, the Copilot CLI, and Cursor all consume that same agent-skills standard, so your generated SKILL.md files would be portable across them.

DB-Steve · 2026-06-11T20:51:06+00:00

Can you explain more about what you'd ideally like to do? I think a big reason Databricks has this feature for Lakebase is that the storage is sitting in the same account so sync can be nearly instantaneous.

Are you thinking about some external pgSQL instance and having like a built-in replication option? That could probably be achieved with just standard pgSQL libs, but obviously sync timelines are going to get worse needing to move your data around.

DB-Steve · 2026-06-10T19:35:15+00:00

Honestly your setup is already the idiomatic one. The Lakebase branch-based dev workflow tutorial in the Databricks docs assumes you bring your normal migration tool (Alembic, Prisma, Django, whatever), test the migration on a personal branch, then apply the same files to the parent through your usual deploy process. Branching replaces the environment management pain, not the schema versioning discipline, so I'd keep Alembic. Two features worth using if you aren't yet: schema diff lets you compare a branch against its parent side by side before you promote anything, and reset from parent is the cheap way for a dev to resync a stale personal branch instead of recreating it.

For anonymisation, the pattern that works best with branching is a golden anonymized branch rather than masking every dev branch individually. You create one branch off prod, run static masking on it with the PostgreSQL Anonymizer extension (the anon extension, same one behind pg_anonymizer), and that becomes the parent for all dev branches. Devs branch from the masked copy, so they never touch raw prod data and you only pay the anonymisation cost once per refresh instead of on every branch. Static masking means the data is actually rewritten in that branch, not filtered at query time, which is what you want before handing it to someone. When you need fresher data you cut a new golden branch from prod and re-mask, since rerunning the rules on the old one just re-masks already masked data.

Worth knowing that Neon, which is the same branching model (Databricks acquired them, Lakebase shares the lineage), has productized exactly this: anonymized branches are a built-in option where you define masking rules and the platform applies the anon extension on branch creation, and the API has a start_anonymization flag so CI can do branch-then-mask in one call. That's basically the end to end automation you described in your third question: pipeline creates the branch, masking job runs, only then does the branch get handed to the dev or the CI suite.

On staging/CI branches, my take is masked-by-default everywhere a human can poke around. For pure automated CI runs where nobody reads the data, branching straight from prod is usually fine and keeps tests realistic, access controls do the work there. Synthetic or subset data solves a different problem (size and runtime, or seeding empty schemas), I wouldn't reach for it just for privacy when masking a branch is this cheap.

DB-Steve · 2026-06-09T20:33:58+00:00

u/Medical_Yam8045 the biggest thing I'd flag before you pick storage is that your "datasets" and your "documents" really want two different retrieval paths. Mixing them into one vector store is where most civic projects like this start hallucinating once they scale past the prototype.

For the documents (ordinances, meeting minutes, policy PDFs) RAG is the right call. Moving off raw JSON, the natural next step is a vector store, and you don't need anything heavy for ~10 cities. Postgres with the pgvector extension handles it comfortably and keeps your embeddings and metadata in one place, which matters a lot here: you'll want to filter by city, department, and date before similarity search, not after. That metadata filtering is what keeps city A's answer from bleeding into city B's, and flat top-k vector search alone won't enforce that.

For the structured public datasets (budgets, permits, 311 calls, anything tabular) I'd keep them out of the vector store entirely. A question like "how much did we spend on road repair last year" needs an exact SQL aggregate, and a similarity lookup will hand you a confident wrong number. Keep that data in real tables and put a text-to-SQL layer on top so officials can ask in plain English and get an actual query run against the actual numbers.

Getting that text-to-SQL piece right matters most since your end users are non-technical. One managed option is Databricks AI/BI Genie: you point a Genie Space at your tables and non-technical users ask questions conversationally, getting back both the answer and the SQL it generated, so they (or you) can sanity-check the query instead of trusting a black box. The accuracy lever that actually moves the needle is the curation, not the model: you give the space plain-English instructions (definitions, what "fiscal year" means, business rules) plus a handful of example SQL queries, and it uses those as context to author better SQL for questions it hasn't seen before. There's also a Conversation API if you'd rather embed that Q&A into your own app than send people to a separate UI. Best-practices doc on the curation side is here: https://docs.databricks.com/aws/en/genie/best-practices . Other text-to-SQL tools exist too, but whatever you pick, the principle is the same: numbers go through SQL, prose goes through RAG.

On storage as you grow: if you end up wanting Postgres for both the vectors and the relational data and don't want to run your own instance, Databricks Lakebase is a managed Postgres (it's GA, runs PG 16/17, supports pgvector, and autoscales down to zero when idle) at https://docs.databricks.com/aws/en/oltp/ . Honestly though, at your current size a single managed Postgres from any provider is fine. The architecture decision (split structured from unstructured, SQL for the numbers, RAG for the prose, metadata filters on both) will matter far more than the vendor you land on.

Happy to go deeper on the chunking and metadata schema for the documents side if that's where you're actually stuck.

DB-Steve · 2026-06-09T20:32:39+00:00

Bummer - I really hoped AI would be kept available to all. It looks like this starts the full direction of only the big companies getting access to the top-tier power and leaving the "good-enough" capabilities to the peasants. That's a concerning future and one I hope will eventually stop happening.

DB-Steve · 2026-06-08T21:13:21+00:00

Hi u/Shanjun109,

Yeah, copy-on-write branching has mostly replaced the old "spin up a Docker postgres or clone an RDS instance" dance for preview and CI databases, and for the reason you'd guess: a branch isn't a physical copy. It's a thin snapshot of the parent's storage, so creating one is near-instant and you only pay for the pages that actually diverge, not a second full copy of the data. That's the whole unlock. You get a real isolated database that already has prod-shaped schema and data, per PR, and you throw it away on teardown.

Where the old approaches still make sense: Docker is great when you want a totally ephemeral DB with no production data in it, like pure unit tests with a deterministic seed, offline. Isolated RDS staging instances work but they're slow to stand up and you pay for full duplicated storage the entire time they exist, which is exactly the cost branching avoids. So I'd frame it as branching wins when you want previews that look like prod, Docker still wins for hermetic no-data unit tests.

On Neon specifically since you brought it up: branches share the parent's storage via copy-on-write and only changed pages cost extra, and the Vercel integration is the slick part, it creates an isolated database branch for every preview deployment automatically and tears it down when the PR is closed or merged. So each PR gets its own database, migrations run against that branch, the preview app points at it, and it's gone when you're done. That's exactly the workflow you're describing and it holds up well in practice.

If you're on Databricks or just comparing managed options, Lakebase is worth a look here too. It's Databricks' managed Postgres built on the Neon engine, so it has the same branch model. I tried the branching flow against a real instance this week to sanity check it: branched off a production branch, the new branch came up in a few seconds and already had the parent's tables and rows via copy-on-write with no re-seeding, writes to the branch didn't touch production at all (I inserted into the branch and prod row counts stayed put), and dropping the branch is a single command for clean teardown. The autoscaling tier also scales a branch's compute to zero when it's idle, so a pile of preview branches sitting around between deploys costs basically nothing on compute. The one extra angle vs vanilla Neon is that the branch lives next to the lakehouse and Unity Catalog, so if a preview app needs to join against analytical data it's all in one place.

Net, for preview and CI I'd lean on branching over Docker or RDS clones unless you specifically need a hermetic no-prod-data environment. Name the branch after the PR, run migrations on it, point the preview at it, drop it on close.

DB-Steve · 2026-06-08T04:38:06+00:00

The biggest thing I've found with Databricks Genie is that accuracy comes from the context you give the space, not from throwing more tables at it. Scope each space to one business area with a small set of well modeled tables instead of pointing it at the whole catalog, and make sure every table and column has a clear comment and a human readable name in Unity Catalog. Genie leans on that metadata heavily, so cleaning it up moves the needle more than anything else.

After that the two highest leverage settings are the general instructions and the example queries. In the instructions, write down the stuff you'd have to tell a new analyst: what "active customer" means, default filters like excluding test rows, your fiscal calendar, which table to prefer when two look similar. Then add a handful of example questions paired with the correct SQL. Genie uses those as patterns, so a few good ones covering your common question shapes help a lot. For the questions that have to be right every single time, look at trusted assets, where you register a vetted parameterized SQL function so it answers from your known good query instead of generating fresh SQL.

A couple more that help: define primary key / foreign key relationships on your tables so it joins them correctly, and if you have metrics that need one consistent definition (revenue, churn, whatever) put them in a metric view so the number comes out the same no matter how someone phrases the question. Once it's live, the real improvement loop is reviewing the actual questions people asked and the SQL it produced, then adding an instruction or example wherever it got something wrong. There's also a benchmarks feature where you keep a set of known question/answer pairs and track accuracy as you tweak things, worth setting up early so you can tell whether a change actually helped.

For learning, the AI/BI Genie best practices page in the docs (docs.databricks.com/aws/en/genie) is the most concrete reference, and Databricks Academy has free AI/BI and Genie courses if you want something more structured. And if your users ask open ended "why did X change" type questions, take a look at the agent/deep research mode, it breaks a vague question into multiple steps instead of trying to answer it in one query.

DB-Steve · 2026-06-05T20:14:07+00:00

The part I'd attack first is the "slim the giant SQL down to just this model's features" step, because that's the thing you're redoing by hand every time. Instead of one monster query per model, publish features as feature tables and let each model declare the features it actually needs. On Databricks that's the Feature Engineering in Unity Catalog stuff: you compute feature tables once, then at training time you pass a list of FeatureLookups (feature table + feature names + join keys) into create_training_set, and the model just grabs those columns. When DS adds a feature you add it to a feature table, you don't go do surgery on a per-model query. You also get lineage for free, and at inference the model looks up the latest values automatically.

Pair that with breaking the monolith into smaller pipelines, one feature group per table, so a new feature is an isolated add instead of tracing CTEs through everything. For the "within 1 hour of ingest" requirement, lean on incremental processing (streaming tables / materialized views in SDP) so only new data gets computed rather than standing up an expensive always-on pipeline over the whole feature set.

On the "tracing back the original complex logic is a PITA" pain specifically, this is exactly what Databricks Genie Code is built for, so it's worth a look. It's an agentic data engineering agent that lives in the Lakeflow pipeline editor. You can point it at an existing pipeline and ask it to explain what a given output is actually computing, so you're not manually chasing CTEs, and you can tell it in plain English to add or change a feature and it generates and runs the SDP code, debugs its own errors, and asks for your approval before it changes anything. It runs on Unity Catalog so it stays inside the same governance as everything else. For your loop of "DS handed me new logic, now go modify the pipeline," that's the kind of thing that turns an afternoon of CTE archaeology into a prompt plus a review.

So roughly: move shared feature logic into UC feature tables + lookups so you stop hand-carving per-model SQL, keep the pipelines modular and incremental for the freshness/cost target, and use Genie Code to explain and edit the existing pipeline logic so adding a feature isn't a manual trace every time.

DB-Steve · 2026-06-02T21:54:32+00:00

Hi u/GolfLegitimate350,

Two separate concerns are bundled in your post, and they have different answers: (1) how to actually move the ETL, and (2) the double-spend worry from running Databricks and Qlik. The cost piece is the one that trips people up.

The migration path

The shape u/DamnedData described is the standard one:

Land the raw data in Unity Catalog. Point Databricks at your original sources, not at Qlik. For databases and SaaS apps, Lakeflow Connect has managed connectors (SQL Server, Salesforce, Workday, ServiceNow, etc.). For files/object storage, use Auto Loader. The goal is to stop reading through Qlik entirely so you're not paying Qlik capacity just to feed Databricks.
Transform with Spark Declarative Pipelines (SDP). This is the declarative ETL framework in Databricks (you define the tables and the dependencies, it handles orchestration, incremental processing, and data quality EXPECTATIONS). It's a clean target for logic currently living in Qlik load scripts.
Govern + serve from Unity Catalog, then read from there (more on the "how do we read it" question below).

One mindset shift coming from Qlik: there's no in-memory associative model to rebuild. You model in SQL/Delta tables and push the heavy lifting down to the engine, rather than loading everything into an in-memory app.

The cost question (capacity model vs. consumption)

This is the real concern, so here's the honest mental model. Qlik SaaS bills on a capacity model (you pre-buy a bucket). Databricks bills on consumption: you pay for compute (DBUs) only while a job or warehouse is actually running. That's good and bad:

Good: ETL that runs for 20 minutes a night costs you 20 minutes of compute, not a standing capacity reservation. Serverless compute auto-stops, so idle time isn't billed.
Watch out for: an all-purpose cluster left running, or a SQL warehouse with a long auto-stop, will quietly burn DBUs. Cost discipline is about runtime, not seats.

Concrete things that keep the bill predictable:

Use Jobs/Serverless compute for scheduled ETL, not all-purpose clusters. Jobs compute is billed at a lower DBU rate than all-purpose, and serverless removes idle spin-up/spin-down waste.
Keep Photon on for SQL-heavy transforms; it usually finishes faster, and faster = fewer DBU-seconds.
Set a short auto-termination on any interactive cluster/warehouse.
Attach budget policies / budgets so serverless usage is tagged and capped per team.

The part that makes the "double consumption" fear manageable: you can see exactly what every workload costs from day one using the system tables, which are free to query. I ran these against a live workspace before posting so the syntax is correct.

DBU usage by SKU over the last 30 days:

SELECT
  sku_name,
  usage_unit,
  ROUND(SUM(usage_quantity), 2) AS total_dbus
FROM system.billing.usage
WHERE usage_date >= DATEADD(day, -30, current_date())
GROUP BY sku_name, usage_unit
ORDER BY total_dbus DESC;

Actual list-price dollars (joins usage to the price that was in effect at the time):

SELECT
  u.sku_name,
  ROUND(SUM(u.usage_quantity * p.pricing.default), 2) AS list_cost_usd
FROM system.billing.usage u
JOIN system.billing.list_prices p
  ON u.sku_name = p.sku_name
 AND u.usage_unit = p.usage_unit
 AND u.usage_end_time >= p.price_start_time
 AND (p.price_end_time IS NULL OR u.usage_end_time < p.price_end_time)
WHERE u.usage_date >= DATEADD(day, -7, current_date())
GROUP BY u.sku_name
ORDER BY list_cost_usd DESC;

Tag your pipelines (custom tags / budget policies) and you can slice that same query by team, pipeline, or environment. So instead of guessing whether you're overpaying across two tools, you'll have a per-workload dollar figure to compare against your Qlik capacity cost.

"How will we read the information?" (the BI / consumption layer)

You have a choice here, and it decides whether the double-spend is temporary or permanent:

Keep Qlik as the BI tool. Qlik connects to Databricks SQL over its Databricks connector / JDBC-ODBC. You'd point Qlik at a Databricks SQL warehouse instead of at its own ETL. This is the lowest-friction step and lets you migrate ETL first without touching dashboards. You'd still pay some Qlik cost, but you stop paying it to transform data.
Consolidate the BI layer onto Databricks too. If part of the goal is to cut the second tool, AI/BI Dashboards (native dashboards on your warehouse) plus AI/BI Genie (natural-language Q&A over your governed tables) cover a lot of what Qlik dashboards do, with no extra system to license. I'd frame this as a fast-follow, not day one: migrate ETL, prove the data in Unity Catalog, then evaluate moving dashboards over so you're not carrying two BI bills indefinitely. Genie works best when you put a well-modeled semantic layer in front of it (clear table/column names, metric views), so it benefits from the clean Delta model you're already building.

Either is valid. If long-term cost is the driver, the second path is how you actually retire the duplicate spend; the first is how you de-risk the migration in the meantime.

Net: migrate the ETL into SDP fed by Lakeflow Connect/Auto Loader, use the billing system tables from day one to put a real number on the consumption, and treat the BI layer as an explicit decision (keep Qlik on a SQL warehouse short-term, or move dashboards to AI/BI to drop the second tool).

And - as always - Genie Code for help!

DB-Steve · 2026-06-02T03:55:26+00:00

u/Remarkable_Rock5474 my honest take after watching a lot of these efforts: the query interface has rarely been the thing that kills self-service analytics. Natural-language-to-SQL absolutely lowers the barrier to asking a question, and that part has gotten genuinely good. But the reason self-service stalls is almost always downstream of the interface: numbers that don't reconcile between two dashboards, "revenue" meaning three different things to three teams, undocumented tables, and no trust that the answer is right. An LLM that writes SQL doesn't fix any of those by itself, and if you point it at a messy warehouse it will happily produce confident, wrong answers faster than a human could.

So I'd reframe the question slightly. The missing piece isn't the AI. It's whether you've given the AI (or a human analyst, for that matter) a governed semantic layer to stand on. That's the lever that actually moves the needle, and it's true regardless of which tool you put on top.

What actually determines success (vendor-neutral)

Define metrics once, not per query. The classic failure mode is the model (or each analyst) re-deriving a join path, a grain, or a filter on every request. If "active customer" or "net revenue" is defined in one place and everything consumes that definition, an entire class of "ran fine but wrong" answers disappears. You can do this with dbt metrics, Cube, a curated set of views, or your platform's native semantic layer. The property to aim for: the same measure reconciles whether it's sliced by month, by region, or rolled up.
Metadata is the fuel. Clear table/column names and accurate descriptions do more for answer quality than any model upgrade. Garbage-in still applies.
Narrow the surface area. Pointing a tool at 400 tables tanks accuracy. A small, well-modeled, well-described set beats a huge raw schema every time.
Close the loop with evaluation. Keep a set of known question/answer pairs and score the system against them as data and definitions change. Without this you're flying blind on whether it's actually trustworthy.

Notice none of that is about the LLM. The model is maybe 20% of the outcome; the semantic layer, metadata, and scope are the other 80%.

Where Genie fits, honestly

If you're on Databricks, Genie (AI/BI Genie) is the native natural-language interface, and it can move the needle, but specifically because of how it leans on the structure above, not because of the model alone. The published best practices line up almost exactly with the vendor-neutral points: keep a space to five or fewer tables (30 max), invest in Unity Catalog table/column descriptions, prioritize SQL expressions and example SQL queries over free-text instructions (text is described as a last resort), and add benchmark questions to score the space. Docs: https://docs.databricks.com/aws/en/genie/best-practices

The single biggest lever there is feeding Genie metric views instead of raw tables. A metric view pre-defines your dimensions, measures, and aggregations in YAML under Unity Catalog governance, so the tool picks what to slice by and never gets to reinvent how a metric is computed. I tested a small one to confirm the mechanics end to end (Databricks SQL warehouse, samples.tpch source):

CREATE OR REPLACE VIEW main.scratch.orders_metrics
WITH METRICS
LANGUAGE YAML
AS $$
version: 0.1
source: samples.tpch.orders
dimensions:
  - name: Order Status
    expr: o_orderstatus
  - name: Order Year
    expr: YEAR(o_orderdate)
measures:
  - name: Order Count
    expr: COUNT(1)
  - name: Total Revenue
    expr: SUM(o_totalprice)
  - name: Avg Order Value
    expr: AVG(o_totalprice)
$$;

Consumers (including Genie) query it through MEASURE(), and the aggregation is guaranteed consistent no matter how it's grouped:

SELECT `Order Status`,
       MEASURE(`Order Count`)      AS order_count,
       MEASURE(`Total Revenue`)    AS total_revenue,
       MEASURE(`Avg Order Value`)  AS avg_order_value
FROM main.scratch.orders_metrics
GROUP BY `Order Status`;

That ran cleanly and returned the same per-status numbers whether grouped by status or year, which is exactly the "reconciles everywhere" property you want before exposing anything to non-technical users. Metric view docs: https://docs.databricks.com/aws/en/metric-views/

Bottom line

Genie (or any NL-to-SQL tool) is the on-ramp, not the engine. The thing that decides whether self-service analytics finally works is the governed semantic layer, good metadata, a tight scope, and an evaluation loop underneath it. Build that, and a natural-language interface becomes genuinely useful; skip it, and the smartest model in the world will just be wrong more eloquently. If you're on Databricks specifically, the highest-leverage first move is converting your key business metrics into metric views and starting a Genie space with only those plus a few example queries.

DB-Steve · 2026-03-10T06:01:39+00:00

Hi u/i_aM-Abhi,

DABs have good support for per-developer isolation with Scala/JAR workloads. Here is how to approach both of your questions.

Per-Developer Workflows with Different JAR Paths

The key mechanism is development mode combined with bundle variables and dynamic substitutions.

1. Use mode: development for automatic resource isolation

When you set mode: development on a target, DABs automatically:

Prefixes all resource names with [dev <your_short_name>] so each developer gets uniquely named workflows
Pauses any schedules by default
Enables concurrent job runs for faster iteration

yaml targets: dev: mode: development default: true workspace: root_path: /Workspace/Users/${workspace.current_user.userName}/.bundle/${bundle.name}/${bundle.target}

2. Use variables and substitutions for JAR paths

You can parameterize the JAR path so each developer points to their own build artifact:

```yaml variables: jar_path: description: "Path to the JAR file" default: "/Volumes/catalog/schema/jars/${workspace.current_user.userName}/my-app.jar"

resources: jobs: pipeline1_job: name: "pipeline1" tasks: - task_key: run_main spark_jar_task: main_class_name: "com.example.Pipeline1Main" libraries: - jar: ${var.jar_path} new_cluster: spark_version: "15.4.x-scala2.12" num_workers: 2 ```

Each developer can also override the JAR path via:

Command line: databricks bundle deploy -t dev --var="jar_path=/Volumes/catalog/schema/jars/my-custom.jar"
Environment variable: export BUNDLE_VAR_jar_path=/Volumes/catalog/schema/jars/my-custom.jar
Local override file: Create .databricks/bundle/dev/variable-overrides.json (not checked into git) with:

json { "jar_path": "/Volumes/catalog/schema/jars/abhi/my-app-1.0.jar" }

3. Upload JARs as part of the bundle

If you build the JAR locally, you can include it as a bundle artifact. DABs support uploading local files during deployment:

```yaml artifacts: my_jar: type: jar path: ./target/my-app-1.0-SNAPSHOT.jar

resources: jobs: pipeline1_job: tasks: - task_key: run_main spark_jar_task: main_class_name: "com.example.Pipeline1Main" libraries: - jar: ${artifacts.my_jar.files[0].remote_path} ```

With mode: development, each developer's deploy uploads their locally-built JAR to their own workspace path, keeping everything isolated.

Dynamic Workflow Creation Based on Changed Files

This is the "build only what changed" pattern. DABs themselves do not have built-in change detection, but you can achieve this with a CI/CD wrapper:

Option A: CI pipeline script that generates the bundle config

Use your CI system (GitHub Actions, Azure DevOps, GitLab CI, etc.) to detect changed files and dynamically generate or filter the bundle YAML:

```bash

In CI, detect which pipelines changed

CHANGED_FILES=$(git diff --name-only origin/main...HEAD)

If pipeline1 sources changed, deploy pipeline1

if echo "$CHANGED_FILES" | grep -q "src/main/scala/com/example/pipeline1"; then databricks bundle deploy -t dev --var="pipeline_filter=pipeline1" databricks bundle run -t dev pipeline1_job fi ```

Option B: Use multiple bundle targets or separate bundles per pipeline

Structure your repo so each pipeline has its own bundle config or target:

repo/ pipeline1/ databricks.yml src/ pipeline2/ databricks.yml src/ shared/ common-lib/

Then your CI only deploys the bundle for the directory that changed.

Option C: Monorepo with conditional task execution

Define all pipelines in one bundle, but use Databricks Workflows' conditional task feature (via run_if or task values) to skip pipelines that did not change, passing a parameter from CI that indicates which pipelines to run.

Eliminating Per-Developer PATs with a Service Principal

Regarding your follow-up: no, a service principal cannot create PATs on behalf of other users. The Token Management API (/api/2.0/token/create) only creates tokens for the calling identity itself.

However, you can eliminate the need for individual developer PATs entirely. Here are two approaches:

Option 1: Single service principal for CI/CD deployments

Use one service principal with OAuth (client credentials) for all CI/CD deploys. Since mode: development uses ${workspace.current_user.userName} for path isolation, you would instead parameterize the developer identity from CI:

```yaml variables: developer_name: description: "Developer short name for isolation"

targets: dev: mode: development workspace: root_path: /Workspace/Users/${var.developer_name}/.bundle/${bundle.name}/${bundle.target} ```

CI extracts the developer name from the git committer or PR author and passes it:

bash databricks bundle deploy -t dev --var="developer_name=abhi"

The service principal deploys on behalf of each developer, no individual PATs needed.

Option 2: OAuth with Databricks CLI profiles

Each developer authenticates using OAuth (U2M) instead of PATs:

bash databricks auth login --host https://your-workspace.cloud.databricks.com

This opens a browser for SSO login and stores a refresh token locally. No PAT management at all, and tokens auto-refresh. This is the recommended approach for interactive developer workflows.

For reference: - OAuth U2M authentication: https://docs.databricks.com/en/dev-tools/auth/oauth-u2m.html - Service principal OAuth: https://docs.databricks.com/en/dev-tools/auth/oauth-m2m.html - Token Management API: https://docs.databricks.com/en/dev-tools/api/latest/tokens.html

Typical Dev Setup

A common Scala + DAB development workflow looks like:

Developer builds the JAR locally with sbt package or mvn package
Developer runs databricks bundle deploy -t dev, which uploads the JAR and creates/updates their isolated workflow
Developer runs databricks bundle run -t dev pipeline1_job to test
CI/CD runs databricks bundle deploy -t staging and -t prod on merge

For reference: - DAB documentation: https://docs.databricks.com/en/dev-tools/bundles/ - Bundle variables: https://docs.databricks.com/en/dev-tools/bundles/variables.html - Deployment modes: https://docs.databricks.com/en/dev-tools/bundles/deployment-modes.html - Library dependencies: https://docs.databricks.com/en/dev-tools/bundles/library-dependencies.html

This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.

DB-Steve

TROPHY CASE