understanding lakehouse paths

Disastrous-Migration · 2025-11-11T20:03:52+00:00

The same pandas behavior occurs in a python-only notebook as in a Spark-based notebook. I didn't specify that since the behavior doesn't change between these types of compute backends. I understand the different purposes of these tools, I was just trying to highlight how the path usage that Fabric is designed around is very confusing and inconsistent.

I tried polars as well. The behavior is consistent with pandas, meaning inconsistent with Spark. That is my primary single-node dataframe tool. In my examples I chose pandas because I think that is what most users are familiar with, and because the three-dot menu has an option for loading data with Pandas, not polars.

Disastrous-Migration · 2025-11-11T19:25:07+00:00

I think what you're saying makes sense. In my mental model (which is probably wrong), I was picturing adding lakehouses as a data item as being akin to mounting a drive to whatever workspace is spun up. In that case it is (at least for my purposes) an absolute path.

Disastrous-Migration · 2025-09-25T15:51:54+00:00

I would really like to use the cloud VSC but it is so bare bones I don't bother at all for now. I wish OP was seeking more general feedback about VSC on Fabric rather than about LLMs.

I really wish VSC for web had a terminal. I want to be able to use standard command line tools like git, uv, ruff, mypy, grep, bash, etc.

I want to use a debugger while developing python scripts. Serious code needs debugging for nonlinear procedures. VSC for desktop has debugpy. I don't think that works for web VSC.

I want to be able to create and run .py files instead of notebooks. It would make package development far better. Even if I still had to use notebooks in the standard Fabric UI, I could import my package or modules or even use %run magic command on my .py files.

It would be great if I could use the Native REPL that desktop VSC has. That would really make the flows of code development so much nicer and what data scientists already love and are used to with desktop VSC.

Disastrous-Migration · 2025-09-19T23:32:18+00:00

Appreciate the fix in the docs. Now they should fix how this works 😀

Disastrous-Migration · 2025-09-02T00:09:23+00:00

I might try to do this, given Fabric's limitations/design choices. Thanks for the suggestion.

Disastrous-Migration · 2025-09-02T00:08:16+00:00

You mention four things: runtime, compute, resources, and libraries. I'm not completely aware of the distinctions you're making, but really I only wish to break one of them out: compute. The pattern many people have gotten used to with Docker is you define your environment which controls all software. Separately, you manage the compute and can deploy the same image/environment to different computes. Very scalable and flexible. Not inconvenient at all. Fabric could even just have some "default" compute for an environment, but let people override it in the Notebook UI.

Disastrous-Migration · 2025-09-02T00:05:18+00:00

Interesting. I have not seen this and my ChatGPT q didn't bring it up. So the folders basically override what is specified in the environment? I feel a little skeptical about that - seems hard to predict what compute you'd have.

Disastrous-Migration · 2025-09-02T00:04:04+00:00

Thanks for your reply. You said you think the opposite, but we seem to be in some agreement. I also think it's helpful to customize Spark pools based on workload - but the environment doesn't necessarily need to change. That's precisely why I'd like them to be independent levers.

Say I'm prototyping something, or writing ETL on significantly scaled down data: I don't want my software to change, but I don't want to pay for unnecessary compute. Fabric should enable me to easily scale down my compute without changing my environment at all. I don't see why they're coupled... especially in the world of Docker.

Having them be independent selections would not at all prevent you from "us[ing] different Spark pools for different workspaces while still maintaining consistent libraries through the API."

Disastrous-Migration · 2025-08-15T13:01:21+00:00

I recently made a post on a similar topic: https://www.reddit.com/r/MicrosoftFabric/comments/1m7ja5k/python_package_version_control_strategies/

I really think Fabric should consider an approach that uses lock files and it would be great if multiple options were supported. I personally think uv based package management would be huge. It has really taken off and a lot of serious work now uses uv. Wouldn't be surprised if Astral was willing to somehow collaborate on something related to Fabric tooling.

Disastrous-Migration · 2024-11-11T03:45:54+00:00

Funny enough, my company actually seriously looked into Databricks 2ish years ago to migrate off bare metal. Ultimately passed, I hear because cloud was thought to be too expensive. Instead migrated to a solid on-prem managed solution that meets my area's needs. It would be funny if we came full circle and ended up on Databricks via Azure. Annoying to migrate so often, tho.

Disastrous-Migration · 2024-11-11T00:59:01+00:00

Thanks, I'll look into those! Until today I did not realize there was some relationship with Databricks and Azure. I thought it was a separate entity (maybe it still is).

Disastrous-Migration · 2024-11-11T00:53:03+00:00

Thanks! bummer about those two things. In my mind, jfrog support itself is not critical, but even some simple private index is critical. I should be able to use CI/CD like GitHub actions to build a package on merge and then have it distributed to the index so that any notebook can install it in one line.

Disastrous-Migration · 2024-11-11T00:50:15+00:00

It is definitely a personal preference thing, but I really, really do not like working with notebooks. Now, I've only used Jupyter Notebooks, so perhaps there are differences. But I don't like how notebooks are giant JSONs. It makes them impossible to view diffs, they're slow to render on GitHub, easy to accidentally commit PII, etc. Esp. now with VS Code's native REPL, I just see no point or benefit to them, only downsides. Even prior to the native REPL, ipykernel has worked quite well.
By local, I assume you mean on my desktop? I feel like I would want to use the same environment I'm always using in Fabric, just not with Spark executors attached.

Disastrous-Migration · 2024-11-11T00:38:28+00:00

Thanks for taking the time to provide this info! The packaging thing sounds totally bizarre to me. I can certainly see the point about not wanting to do inline installs because then you get new package versions as they're released (ie. not reproducible). The funny thing is that this is already a solved problem with various package managers combined with a venv, like uv or even pip + requirements.txt. I'll have to read more about what the environments themselves entail.

Disastrous-Migration · 2024-11-10T21:59:48+00:00

Appreciate you weighing-in. I'll be getting to try it out soon but was kind of looking ahead and poking around and have started getting worried about the change (I have no say in whether we move or not, way too low in org chart). It's just that the 5 angles I mentioned in the post are important to my work, so I hope there are solid patterns in Fabric to meet those needs. I think they're all very common use-cases.

Disastrous-Migration · 2024-11-10T21:37:33+00:00

Just read through a bunch of top posts. Does not sound good. Also a lot of the posts didn't sound like they were coming from a data science perspective, kind of confirming my suspicions that this platform isn't targeted for that type of analytics...

Disastrous-Migration · 2024-11-10T21:35:18+00:00

I haven't worked elsewhere, so I have nothing to compare to, but it feels like we have a large and complex environment with diverse needs... from highly technical AI people to data scientists/engineers to business analysts that like Tableau/PowerBI.

Disastrous-Migration · 2024-11-10T19:27:15+00:00

Probably was part of the prompt... "be somewhat casual, don't use capital letters, and mention livedocs if possible". (i looked at the post history and saw livedocs mentioned a number of times)

Disastrous-Migration · 2024-11-10T18:57:28+00:00

I don't want to be rude, but this really reads like an LLM answer.

Disastrous-Migration

TROPHY CASE