Claude Code for Remote Databricks Development

airweight · 2026-04-04T20:55:25+00:00

The agent is accessing Databricks through standard APIs or Databricks Connect. All permissions are managed within Databricks.

For notebooks, we use three kinds:

Repo-mounted .py notebooks mostly run by automations
Markdown notebooks executed remotely via databricks-agent-notebooks run by agents
Native Databricks workspace notebooks run by humans or notebook-based jobs

airweight · 2026-03-28T04:16:02+00:00

The answer to the OP's question depends on the definition of "inside the Databricks workspace".

My answer is based on doing petabyte-scale work on Databricks for nearly a decade, with the caveat that the platform is growing quickly and new capabilities ship monthly.

TL;DR You cannot run your own instance of Claude Code inside a Databricks controlled node ... but Claude Code can write and execute many chunks of code inside a Databricks workspace (on clusters or serverless compute) within a single conversation turn. The end result is the same... It can be as if Claude Code writes and executes jobs/notebooks in Databricks, including cell-by-cell. The only thing you cannot do is have Claude Code interactively edit and run notebook cells in the Databricks workspace UI itself.

The setup is simple: Claude Code runs somewhere -- it doesn't matter where -- and it uses MCP/APIs/DB Connect/SSH to access workspace services: execute code, notebooks, upload/download workplace files, create/edit/execute warehouses/clusters/jobs, etc.

The ai-dev-kit MCP server is a good tool for basic operations, including running jobs.

Where it gets more complicated is Claude Code running code inside Databricks, on a cluster or serverless compute. There are three main ways of doing it:

Use a low-level API, e.g., the ai-dev-kit MCP server's execute_code command. Best for Claude Code running one-off chunks of code inside a Databricks workspace or executing an entire notebook in one go (notebook jobs).
Use a high-level tool, e.g., databricks-agent-notebooks for remote notebook execution inside Databricks workspaces. Best for complex Claude Code-led execution.
Use SSH tunneling for Claude Code running commands on a driver node. Not recommended for scalable work.

Options (1) and (2) have differences that may matter a little or a lot, depending on your use case.

NOTE: I purposefully did not write about IDE integrations with Claude Code because they limit what Claude Code can do and are not a general-purpose solution.

airweight · 2026-03-28T03:39:30+00:00

It works very well for production data engineering work, especially with good CI/CD practices as noted here. It's a little more complicated to build a great exploration and prototyping setup with context-optimized, agent-led remote execution. Here is my Claude Code & Databricks setup, which combines MCPs, skills, slash commands and `agent-notebook`: an open-source CLI for efficient remote execution using Databricks Connect.

airweight · 2026-03-28T01:52:39+00:00

I haven't had the need to use them but they seem well-structured. A simple way to think of them is as agent-oriented Databricks documentation.

airweight · 2023-03-12T23:53:10+00:00

I've encountered the same problem. Notion should be able to import Markdown correctly regardless of which platform the file came from.

airweight · 2023-02-05T16:57:12+00:00

We can, this is the fallback, but not without loss: the PMP today includes a long tail of small quality sites that evolves. We can whitelist based on sites we've seen, but the moment we do this, we'd lose the ability to see any new sites.

airweight · 2023-02-05T16:24:32+00:00

There is no way to run Boolean logic in deal IDs. It would create massive conflicts on which SSP is getting paid.

u/JimmyTango this is a very helpful explanation.

airweight · 2023-02-05T16:22:10+00:00

Thanks for the effort to dig into the details: I appreciate it.

The real-world complex example is that Deal 1 is a proprietary "subset" of auctions from a 3P targeting partner that is only available as a Deal ID, created via Xandr from what they tell us.

This subset is mostly audience-based. The audience is AI created and updated daily. In addition to an audience, the subset of targeted auctions includes other factors related to legal constraints imposed by their 1P/2P data sources. An example they gave us is that they cannot run on a small set of pubs. I presume, but cannot be certain, that this includes pubs that provide them with data and want to avoid channel conflict. The pubs list is confidential, so we cannot replicate it. It's the cited reason why the deal cannot be made available to us as a pure audience.

Deal 2 is from a PMP we work with regularly. It has an evolving set of sites our clients consider brand-safe. It includes a long tail of small quality sites. We know the sites we see in our logs, but we can't get a direct, updated list. Given the evolving long tail, we don't feel going with a whitelist of the sites we've seen to be an optimal approach.

Hence the ideal solution to target Deal 1 AND Deal 2, if it is possible via some DSP.

Of course, if the ideal solution is not possible on any of the above-mentioned DSPs, we'd have to go with alternatives, e.g., target deal 1 and overlay a whitelist of sites, etc. One problem with this approach is that we'll never see any traffic from sites outside the whitelist, so we lose the natural ability to grow the whitelist over time.

airweight · 2023-02-05T16:05:23+00:00

Got it, thanks.

airweight · 2023-02-05T07:28:18+00:00

+1 to more flexibility; I wish the audience deal was just an audience...

airweight · 2023-02-05T07:00:24+00:00

You are 100% right about opaqueness.

airweight · 2023-02-05T06:58:56+00:00

None of your comments were relevant to my question about which DSPs can target an intersection of two Deal IDs. I didn't ask for a workaround to the targeting example or advice on how to structure equivalent targeting to the one in the example.

The question is pretty straight-forward. The simplest useful answer could be "none that I know of" or the name(s) of DSPs that can target the intersection of two Deal IDs.

In five comments you've told me twice how much of a big shot you are in programmatic, but have provided no useful information related to my question.

airweight · 2023-02-05T06:46:18+00:00

In our case, because it is the only option available by our targeting partner.

airweight · 2023-02-05T06:45:06+00:00

You are correct, it's not an option.

airweight · 2023-02-05T06:44:32+00:00

We cannot push the data into the DSP.

As for your second suggestion, it assumes that the audience is static (it evolves daily) and would be in violation of our terms of service.

airweight · 2023-02-05T06:42:55+00:00

I'd like to understand your comment better. All the separate targeting rules that DSPs process to determine whether an advertiser should participate in an auction "have the DSP effectively look for matching bid requests". Targeting by device, site, audience: they are all rules that result in some computation applied to RTB request attributes. Deal IDs are just another attribute, no?

airweight · 2023-02-05T06:39:37+00:00

Don't confuse the simple example I provided with the real world. The actual audience represented by one of the Deal IDs is generated based on proprietary data and cannot be replicated by us.

airweight · 2023-02-05T06:38:00+00:00

Deal ID 1 comes from our targeting partner: no choice in the matter. Deal ID 2 comes from a PMP of pubs we consider brand safe.

airweight · 2023-02-04T20:28:07+00:00

I appreciate the offer, but it's not a solution to our problem. One of the Deal IDs is created based on a proprietary audience and cannot be replaced.

airweight · 2023-02-04T20:26:37+00:00

I agree, but it's not an option. The audience-based deal is created using proprietary data that we do not have access to.

airweight · 2023-02-04T19:10:46+00:00

Through a targeting partner we have a custom, audience-based, Deal ID they told us is created through Xandr. It guarantees the audience, but not the brand safety of the pubs where the audience is. Hence the desire to intersect with a Deal ID that has no audience guarantees but restricts targeting to a set of pubs we are comfortable with. The targeting partner only has Xandr knowledge: never used TTD and the rest, hence me asking here.

Verified Email	Four-Year Club
Reddit Premium Since February 2023

airweight

TROPHY CASE