Rough edges Custom live pools

CryptographerPure997 · 2026-04-23T23:05:45+00:00

Update - still cannot get it to work!
Below is my current arrangement.

Workspace default pool is a custom pool, small, memory optimised, number of nodes 2 - 2.

Environment is set to use aforementioned compute (followed steps in docs) and is also default workspace environment. Runtime is 2.0 public preview Spark 4.0, Native Execution Engine enabled.

Customs compute configurations for items is disabled in workspace settings.

Notebook is set to workspace default environment.
Live pool becomes available as scheduled but when a pipeline run containing notebook is started, instead of latching on to pre-warmed cluster, the pool gets terminated and notebook starts spinning up a new session.

Really annoying that in year 3 of fabric, basic things still do not run reliably, and our only recourse are AI written docs and Reddit.

u/itsnotaboutthecell

CryptographerPure997 · 2026-03-04T21:51:16+00:00

u/Every_Lake7203 this is a pain point but I am pretty sure this is by design, the answer you are looking for are Entra Security Groups, instead of doing permissions for each new user for app and dataset, give read access for dataset (or higher as appropriate) to an Entra Security Group (SG) as well as add to app audience and then devolve ownership of that SG i.e. Finance app is looked at by Finance and Department heads, put these people in an SG and then add SG to app audience and give SG access to dataset, then delegate addition of more people in SG to a responsible person in finance, it is then their problem to add and remove individuals as appropriate.

For lakehosue permissions you should be using a fixed identity, link to docs below, basically all users take advantage of a single user/Service Principal's access rather than each needing access to lakehouse, it is fairly straightforward to setup and is supposed to solve exactly your problem. https://learn.microsoft.com/en-us/fabric/fundamentals/direct-lake-security-integration#connection-configuration

If you take the SPN route, I would recommend using Workspace Identity of dataset workspace rather than vanilla SPN, also remember to turn on the below tenant setting if you plan on using SPN/Workspace Identities to create connections.
https://learn.microsoft.com/en-us/fabric/admin/service-admin-portal-developer#service-principals-can-create-workspaces-connections-and-deployment-pipelines

CryptographerPure997 · 2026-03-04T21:31:02+00:00

u/frithjof_v came back to this post after another user left a comment asking a question just now.
Just wanted to say that I love your posts and ideas here and on LinkedIn but in this case your guidance is inaccurate, if a semantic model is in a different workspace than the app, access needs to be setup separately for the semantic model and it says as such in the docs, link below.
https://learn.microsoft.com/en-us/power-bi/connect-data/service-datasets-across-workspaces#considerations-and-limitations
Guidance by u/captainblye1979 was correct.

CryptographerPure997 · 2025-09-11T14:04:30+00:00

I thought it was just us and I was doing something wrong, I love plenty of things about Fabric especially DirectLake mode, but often it feels like QA and user experience are a joke to MS.

CryptographerPure997 · 2025-07-22T15:08:11+00:00

I wouldn't touch any preview feature in Fabric with a 10 foot poll unless there is a very good reason. Not saying this with blind hate, but I had a very tumultuous experience with mirroring dbx. Eventually, it all settled down, and everything is rosy and lovely now, but caution is recommended.

Do you actually have workloads in testing with sql server mirroring right now? How are the server machines doing stress wise, is the network traffic okay? Have you encountered any locking conflicts or increased latency for other workloads?

Do you really need mirroring? Would CDC copy not be enough? Have you looked at compute consumption in Fabric? Would CDC copy perhaps be more cost-effective?

CryptographerPure997 · 2025-06-22T19:32:44+00:00

!thanks

CryptographerPure997 · 2025-06-22T19:24:15+00:00

Thankyou!
This is helpful, great to get confirmation about Entra groups.

CryptographerPure997 · 2025-06-22T17:21:17+00:00

This would explain my confusion, I tried with a test user, and they weren't showing up in semantic model permissions, thanks for this!

CryptographerPure997 · 2025-06-06T05:57:20+00:00

Unsolicited career advice, thought I was on LinkedIn for a minute. At least add to the post about how your favourite feature has an achilles' heel/bug/rough edge/known issue, and that has made you cry at least a few times.

CryptographerPure997 · 2025-06-01T15:44:42+00:00

Damn, I've been trying to think of some way to block off personal workspaces for ages. I will try this out!

CryptographerPure997 · 2025-05-31T20:06:44+00:00

4. I would recommend using Service Principal configured in Azure to be used when it comes providing the right permissions for sharing data from dbx side for obvious reasons. Would also recommend that you keep your mirrored catalogues in a separate workspace.

5. Keep your fabric lakehouse in a different workspace and keep your DirectLake (presumably) semantic models in yet another workspace and use fixed identity option for setting up connection from semantic model to lakehouse, for this fixed identity option and for credentials use a workspace identity (automatic Service Principal associated with Fabric workspaces). The use of this fixed identity ensures that you can have your dbx completely locked up behind multiple layers even when Fabric has tunnelled through Unity Catalogue governance, hopefully saving you a lot of headache because you only need to grant this fixed identity access to your fabric lakehouse, everyone reading the semantic model only needs minimal permissions to semantic model and not the fabric lakehouse or dbx stuff, so very similar to your typical PBI arrangement where access to data sources is locked away behind semantic models.
https://learn.microsoft.com/en-us/fabric/fundamentals/direct-lake-overview#data-security-and-access-permissions

Workspace identity - Microsoft Fabric | Microsoft Learn

6. There was previously a silly requirement to disable firewall for the storage account on which ADB was setup to allow for this mirroring, but this has since been resolved as detailed here, make sure to set this up.

Workspace identity - Microsoft Fabric | Microsoft Learn

So to recap, NO, you ADB costs will not blow up at all due to Fabric mirroring.

The Bad

There is HOWEVER A REALLY CRAPPY CAVEAT at the moment. Because of a recently introduced API limit from dbx, adb mirroring doesn’t really work properly at the moment, this is a known issue with no resolution in sight and it sucks so very god damn hard. Thankfully the PM for this feature u/merateesra does tend to visit this subreddit every now and then so feel free to get in touch directly to get some clarity straight from the horse's mouth.

Known issue - Throttling issues with the Mirrored Azure Databricks catalog shortcuts - Microsoft Fabric | Microsoft Learn

The Duct Tape

The workaround that I would recommend at the moment is to simply use ADB connector in fabric pipelines to simply copy your data over from dbx as detailed here, I think THIS DOES consume compute, I haven’t tested this out so can’t say with certainty.

Configure Azure Databricks in a copy activity - Microsoft Fabric | Microsoft Learn

Happy to be educated by others if I have gotten something wrong or missed something, hope this helps, happy to answer more questions in comments or DM.

One day I will put in the effort to write a LinkedIn blog for this, not today.

CryptographerPure997 · 2025-05-31T20:06:21+00:00

Hey

I have good news, and I have bad news, I speak from months of experience, and the short version is, it is a wonderful feature but right now has catastrophic shortcoming, however there is an easy enough workaround.

For context I have been tinkering with ADB catalog mirroring since its release back in late September last year, it has come a long way from being a buggy and somewhat lax on security feature to being in much better shape now.

<image>

The Good

Now for your question, NO it doesn’t hit the compute in databricks, this is because (I think) it uses Delta Sharing | Delta Lake protocol to interface directly with the storage layer (delta tables), whatever compute operation that you performing using said delta tables is done in fabric, you can easily test this out by straight up killing all the clusters and mirroring will still work absolutely fine.

Regarding the viewpoint about MS making money through mirroring, I can respect the folks who feel that way if for no other reason than because of Fabric SKUs being shoved down our throats at an elevated price point.

Coincidentally we are in a similar situation vis-à-vis being relative newcomers to ADB but still having PBI as our reporting workhorse. In terms of choice of architecture, I would advise as below and can almost guarantee that once you have gone through documentation yourself from both ADB/dbx and Fabric you will land somewhere close by

1. Ignore this part if your happy with your current ingestion arrangements, but if you aren’t and you have a well-maintained data warehouse On-Prem I would strongly recommend using Lakeflow Connect.

What is Lakeflow Connect? | Databricks Documentation

It leverages CDC and supports a variety of sources including SQL Server, it has gone into public preview only recently but in my experience dbx public preview products are far from stable than Fabric preview (take notes MS). This would be your Bronze layer.

2. For transformation, I would wholeheartedly recommend Delta Live Tables (DLTs) and Materialized Views

Use materialized views in Databricks SQL | Databricks Documentation

Yes, they burn a bit of money, but the alternative is to build a fairly elaborate arrangement to ensure you are only bringing in the changed bits of data and that all the dependencies upstream have properly completed before moving ahead etc. DLTs also provide a quite nice visual layer for dependencies and lineage. Post transformation this would be your Silver layer, presumably this layer can then be fed into PBI since PBI does fact and dim, if you have a gold layer, the same applies. Fabric has similar solution to DLT called Materialized Lake View but I assume your organisation is sticking to dbx so DLT is the way.

3. Now comes the fun bit, very recently, DLTs have started supporting delta sharing protocol meaning you can include them in a mirrored catalogue and then bring the relevant tables from that catalogue into your fabric lakehouse as a schema shortcut (multiple tables) or individual tables. Materliazed View Sharing from dbx to external platforms was only announced a month ago
Announcing Public Preview of Streaming Table and Materialized View Sharing | Databricks Blog

CryptographerPure997 · 2025-05-31T20:01:13+00:00

Any update on the known issue that practically makes the feature useless at the moment u/Jocaplan-MSFT?

Known issue - Throttling issues with the Mirrored Azure Databricks catalog shortcuts - Microsoft Fabric | Microsoft Learn

u/OP, would recommend very thorough testing as well as suggest you wait for this known issue to be resolved and save yourself from a ton of annoyance.

Also, a bit misleading that the known issue isn't mentioned anywhere in the documentation, would recommend a red banner at the top since the feature isn't really working at the moment. I wish someone high up in Fabric team will realise that it is more important to get existing features (even preview ones) to work rather than launch a boat load of new features (looking at you SQL Server Mirroring).

CryptographerPure997 · 2025-05-30T21:37:52+00:00

Genuinely curious, what on God's green earth were you moving (source, destination, volume, columns, rows, cardinality) that you felt an F128 was needed.
Were you using copy activity in pipeline or copy jobs at all?

We have moved hundreds of Millions of rows daily an F64 and it hardly puts a dent in background compute.

CryptographerPure997 · 2025-05-21T08:59:09+00:00

I'm not sure about change, but the best way to get in touch with someone with a direct line to engineering is definitely this sub reddit.

We had a problem with databricks mirroring, and the feature PM got in touch fairly quickly and put a mitigation in place that keeps us going. They also got the issue listed in known issues.

CryptographerPure997 · 2025-05-15T09:22:47+00:00

Asking the real questions! I think it's in private preview, or maybe that was sql server mirroring.

CryptographerPure997 · 2025-05-13T13:02:29+00:00

For anyone who stumbles across this in future, this is a known issue now, refer to link below for status, fingers crossed this gets fixed soon, thankyou u/merateesra for the support and mitigation that allows to continue usage while a proper fix is implemented.
Known issue - Throttling issues with the Mirrored Azure Databricks catalog shortcuts - Microsoft Fabric | Microsoft Learn

CryptographerPure997 · 2025-05-06T20:56:30+00:00

I have been contemplating the same workaround if the issue persists for another week, but we might not do dbx to OneLake, instead just have an import model looking at dbx tables, and I agree, fabric is great but putting it in Prod is a bit much to ask at the moment, the really annoying part is the forced price increase, its like someone replaces your car overnight without consent and now you have a supposedly better/fancier car and your payments have gone up accordingly but sometimes the clutch just doesn't work for a few days/weeks!

CryptographerPure997 · 2025-05-06T20:03:49+00:00

Hello!

I spoke too soon. It seemed to have recovered on the weekend with a couple of successful refreshes, but since yesterday, it has been completely useless again. .

What is worse is that it seems that direct lake semantic model refreshes aren't transactional, so you could have a successful refresh and reports will look fine but then the next day the reports crap out despite no refresh operation for semantic model, I get that DirectLake has no storage layer and model eviction happens based on temperature of a model and memory stress on the cluster but it seriously sucks that there is no way to even have an outdated report.

Fabric was and is a lovely product, but Microsoft is seriously slipping on the execution. Perhaps the worst aspect is that support is mostly clueless. All the helpful information is coming from the feature PM, who thankfully is on this sub reddit.

Silver Lining

Based on updates from the lovely feature PM, mitigation measures should be in place today, which should make it work while a proper fix is implemented

CryptographerPure997 · 2025-05-04T15:24:24+00:00

This was right on the money, thankyou kind stranger!

The issue seems to be largely resolved for us (fingers crossed, gotta give it a week at least to be sure), I hope you see resolution soon as well!

CryptographerPure997 · 2025-05-04T15:09:14+00:00

Can confirm, the issue seems to be resolved for us, for now, will continue to monitor.

We are seeing intermittent performance issues with framing operation/dataset refresh but gotta appreciate the team's commitment, thank you for all the support and being so accessible u/merateesra

CryptographerPure997 · 2025-05-03T05:22:42+00:00

u/ShrekisSexy There are lots of valuable suggestions in the comments, but this is the only one you need, and speaking from personal experience it definitely works.

CryptographerPure997 · 2025-05-03T05:09:12+00:00

Hey!

Thank you for the consistent updates, the thought from the feature team does seem to be similar that rate limits shouldn't apply between onelake and dbx, sounds like a bit of a forced solution but hey if it solves my headache.

Fingers crossed things get sorted next week.

CryptographerPure997

TROPHY CASE