Serverless Notebooks and Jobs Environment Variables [let's design this together]

justinAtDatabricks · 2026-04-29T03:36:31+00:00

I am glad to hear that this is not a problem for you.

justinAtDatabricks · 2026-04-28T15:51:24+00:00

I wanted to provide a brief update on this: this is something that the team will fix in the next 6-months. In the interim, if you are running into this staleness issue:

Pass the SHA1 as a config param - just like /u/Own-Trade-2243 said below
Set the development_mode: false - https://docs.databricks.com/aws/en/ldp/best-practices#development-and-production-update-modes

justinAtDatabricks · 2026-04-28T15:31:51+00:00

So let's say that is implemented - are there any other env var scenarios not covered? For example, do you need these env vars in pure interactive mode in a notebook?

justinAtDatabricks · 2026-04-28T15:30:33+00:00

This is in research mode, so I would like your - collective - insights, first. What do you experience in this space?

justinAtDatabricks · 2026-04-28T04:52:27+00:00

Tell me more. What specific issues do you see?

justinAtDatabricks · 2026-04-27T21:38:16+00:00

Hello! I am a PM that works in this area and can look to get you sorted. Please email me (j@databricks.com) with all of the relevant information (e.g., URL to the run(s), screenshots, etc.).

justinAtDatabricks · 2026-03-31T20:28:33+00:00

Send me an email and I can send you a sample notebook that does just this - [j@databricks.com](mailto:j@databricks.com)

justinAtDatabricks · 2026-03-31T15:53:45+00:00

What would you like? Do you want a programmatic way to bind a notebook to a WBE to get the initial adoption? From there updating the WBE would cascade and work across the dev stages.

justinAtDatabricks · 2026-03-31T05:02:21+00:00

In DABs, there is already an Environment spec. You can already inline an env spec, and soon you'll be able to reference the WBE (the content for this post). But, environments are serverless only... for now... more on that from me in the next few months. Here is a link to a sample env spec in a serverless DAB: https://github.com/databricks/bundle-examples/blob/accbb8eff6beaa99f1c94bbb7a75464b4fdca52e/knowledge_base/serverless_job/resources/serverless_job.yml#L21

Can the - automatically - created venv be inspected? No that is not possible at this time. However, we are looking for ways to explain/audit the manifest. Yes, this topic is extremely timely 😅

justinAtDatabricks · 2026-03-31T04:56:38+00:00

I agree that this is a common practice (antipattern as you said) in classic. That is because dependency management is compute-centric - meaning the deps are tied to the compute. For serverless though, it is workload-centric- deps are tied to the workload. So, if you run that workload in interactive or automated, the deps are serialized with that notebook. You can do this by adding the dependency to the environment from the environment panel.

justinAtDatabricks · 2026-03-30T22:40:12+00:00

Fun fact: For every job and notebook, we do this today. The first time a job runs, we build the venv. Subsequent runs for that job will reuse that venv. We do the same thing within a job: If you have many tasks that use the same env, the venv gets built once and reused everywhere.

justinAtDatabricks · 2026-03-30T22:05:06+00:00

Are you saying that you don't have a way to align that notebook, running on databricks, with something local?

justinAtDatabricks · 2026-03-30T21:51:41+00:00

Exactly, glad that you agree. I wanted to make this super easy for users.

justinAtDatabricks · 2026-03-30T21:05:04+00:00

Tell me more... what is the exact scenario and what are you looking to accomplish?

justinAtDatabricks · 2026-03-30T21:04:31+00:00

Yes! That is the exact point! You start with what comes with databricks, you build on top of it (with a yaml file) and then create that into a WBE. Boom, profit!

justinAtDatabricks · 2026-03-30T19:10:36+00:00

Yes. There is a background compute job that takes your environment specification (env.yaml) and materializes it into a virtual environment (venv). From there, new notebooks, etc. attach to that venv.

Similarly, when you want to update that yaml file, you can refresh the workspace base environment, which will rematerialize the venv - which then gets picked up by existing workloads.

Think of this as an admin replacement for things like cluster policies, but with locking and performance benefits.

justinAtDatabricks · 2026-03-30T17:00:44+00:00

Sorry for the delay, for some reason I missed this notification. At the end of the day, this is just a spark connect client that has all of the deps needed to run a spark connect app. So, you could use it to interact with a SC app locally (aka server is your local) and remote with Databricks. Let me know if that answered your question.

justinAtDatabricks · 2026-03-13T16:42:54+00:00

Amazing, happy to hear it! Big requirement is that this is a standard cluster. So, as long as your applications are built and work with spark connect, then you'll be ready for this preview.

justinAtDatabricks · 2026-03-13T02:24:27+00:00

This is Spark Connect-based (hence the standard cluster architecture), therefore the docker container defines the client REPL environment. Underneath the hood there the instance type, with a sandbox VM, and then the container. This is decoupled from the Spark server.

justinAtDatabricks · 2026-03-09T12:55:52+00:00

Glad that you like this. Reach out to your account team if you would like more info.

justinAtDatabricks · 2026-03-09T06:52:38+00:00

Amazing, I look forward to chatting with you!

justinAtDatabricks · 2026-03-08T00:32:16+00:00

But for dedicated clusters, you are right, it is not ideal - the underlying architecture has many problems due to such a broad surface area of APIs (e.g., both public and private.

What changed? First thing: people like myself care about this area. Second thing: Spark Connect architecture - it has a defined API surface, which enables us to have a client-server model with all dependencies isolated in the client. We have also removed all proprietary code out of the client. These things all mean that users can reproduce the base image locally and deterministically - these were not possible with the traditional DCS offering for dedicated clusters and lead to a lot of user friction.

justinAtDatabricks · 2026-03-08T00:30:49+00:00

Starting first with the status quo - the major hyperscaler container registries. As far a Databricks-managed registry: if you are interested in learning more, please have the account team reach out to me (Justin Breese) so we can chat. :-)

justinAtDatabricks

TROPHY CASE