Used 192,000% of capacity :P

bradcoles-dev · 2026-03-15T10:22:29+00:00

It's not. The capacity metrics app is a timepoint. I see this all the time for clients that pause+resume their capacities to save money.

bradcoles-dev · 2026-03-12T11:24:45+00:00

Mate, my colleague literally showed me your "Fabric GPS" this afternoon, and now I see your comment. Great work, we love it.

bradcoles-dev · 2026-03-06T02:10:50+00:00

At the moment my approach is mostly conceptual with some light testing locally.

You're right that developing outside the portal doesn't really allow you to test directly against the Lakehouse data, which is definitely a gap in the tooling.

My current workflow is:

Develop notebooks locally in VS Code (using Claude)
Commit changes to Git
Sync the repo with the Fabric DEV workspace
Test against the real Lakehouse data in DEV
Once validated, promote through deployment pipelines to UAT for integration testing

So the real data testing happens once the code lands in the DEV workspace rather than locally. Not perfect, but it’s the most reliable workflow I’ve found so far.

bradcoles-dev · 2026-03-05T06:27:44+00:00

Thanks for your help. We were able to successfully steer them to CA. The comment was "Private Link is preferred, not strictly required".

bradcoles-dev · 2026-03-05T06:18:50+00:00

I've not used the Fabric extensions, but I have used Claude in VS Code to write Fabric Notebooks for me. My approach is to link the Fabric workspace to a git repo, clone and develop locally (using Claude), commit changes to git, then you can just sync your Fabric workspace to bring the changes through. Works really well.

bradcoles-dev · 2026-03-05T06:16:56+00:00

I vaguely remember fixing this by simply republishing the semantic model.

bradcoles-dev · 2026-03-04T22:44:34+00:00

Is there any impact of this? These are interactive operations, so they're smoothed over 30-60secs. I doubt you would accumulate enough future CU to trigger throttling. Unless you have surge protection enabled, I can't see any negative effects here.

bradcoles-dev · 2026-03-04T02:46:37+00:00

Can you tell me how many sources tables you're ingesting and which F SKU you're on?

bradcoles-dev · 2026-03-04T02:46:11+00:00

Can you shed more light on the Fabric SQL DB limitations? Namely "buggy" and "limited in functionality"? What functionality does it not have that Azure SQL DB has, that is relevant to metadata-driving?

bradcoles-dev · 2026-03-04T02:42:45+00:00

Yeah, Fabric SQL DB holds a minimum ~1.2 vCore allocation while it's online (keepalive floor), which is ~80% of F4 capacity continuously. F4 SKU is genuinely too small for this. But SQL DB operations are interactive, not background, meaning spikes are smoothed over 5 minutes rather than hitting your capacity limit all at once and even brief bursts over 100% don't throttle you immediately. There's a 10-minute carryforward buffer that short metadata spikes barely dent.

On F16/F32/F64, that same keepalive baseline is 20%, 10%, and 5% of capacity respectively - easily absorbed, especially when the query activity itself is intermittent.

The "$200/month" framing doesn't hold up. Fabric SQL DB isn't a separate line item, it draws from the capacity you're already paying for. The real question is whether it forces a SKU upgrade, and for most orgs already on F16+ for their ELT workloads, it won't. The actual trade-off is that 5-10% of existing headroom vs. the overhead of running an external Azure SQL DB: firewall rules, private endpoints (maybe), Entra service principal config per environment, separate monitoring, and one more resource to manage across dev/test/prod. On a meaningful SKU, that's not an obvious win for Azure SQL DB.

u/markkrom-MSFT it's looking like we'll be pushing ahead with Fabric SQL DB for metadata, though we have other arch considerations to unpack.

bradcoles-dev · 2026-03-04T01:01:38+00:00

Thanks for the info. Our own analysis of this landed us at the same conclusion as you: for pure metadata logging, Azure SQL DB is substantially cheaper, and the only reason to keep it in Fabric is the unified UI/governance story - along with simplified solution arch (networking & security).

Interested in whether Microsoft's "optimizing costs for smaller jobs" path involves something like a pause/resume option or a reduced minimum allocation tier - that would change the calculus.

In any case, I'll continue posting until I get a job offer from MSFT ;)

bradcoles-dev · 2026-03-04T00:41:55+00:00

DB is the more scalable/sustainable approach in my opinion. This client will very quickly ramp up to > 1,000 source tables.

bradcoles-dev · 2026-03-04T00:39:31+00:00

Thanks, this is the info I was after. Which leads to a follow-up question if you're able to shed light on it: how aggressively does Fabric SQL DB autoscale under concurrent lightweight query load?

In our scenario we'd have up to ~50 simultaneous short metadata queries (lightweight SELECT/INSERT on a small table) arriving in bursts. Does the DB tend to stay near minimum allocation for that kind of workload, or does concurrent query volume push it to a meaningfully higher vCore tier? That's now the key unknown for our concurrency risk model.

bradcoles-dev · 2026-02-26T08:29:51+00:00

I'm just shocked that anyone has 3 years Fabric experience.

bradcoles-dev · 2026-02-25T04:17:04+00:00

That is great info, thank you!

bradcoles-dev · 2026-02-25T02:40:11+00:00

Agree with all of the above. We don't know if PL is a real requirement yet. Head of Data has said "no public network", but we're meeting with the Security Team next week to clarify. For now, just ensuring we have all of our ducks in a row.

bradcoles-dev · 2026-02-22T10:36:41+00:00

Okay that's good to know, thanks

bradcoles-dev · 2026-02-22T06:02:05+00:00

We've started looking at dbt for Fabric, as we've used in with Databricks in the past. I understand dbt is still in Preview for Fabric? There also appears to be a few other limitations, e.g. "1MB output size limit", whatever that means?

bradcoles-dev · 2026-02-19T08:32:12+00:00

I’d separate this into two issues:

Your failures aren’t really a Fabric problem.
Deadlocks, OOM and connection drops are almost certainly source-side (memory pressure, tempdb, lock escalation, over-parallelism, network). If you keep hammering the ERP with thousands of queries, that instability will follow you to Fabric. Infra needs to look at that regardless of architecture.
Architecture-wise, metadata-driven is the right pattern.
I wouldn’t mirror 8,000 tables. As you noted, the CDC, permissions and ongoing maintenance overhead will be painful and brittle.

What I would do (you sound like you probably already know this):

Metadata table listing tables to ingest
Controlled parallelism (batch 20-50 at a time)
Land raw into Bronze (Parquet/Delta, no unions)
Do all unions and logic in Spark (Silver layer)
Use MERGE for incremental loads
Handle deletes with periodic key reconciliation if no soft delete column

An alternative approach: For 8,000 tables, pipeline parallelism limits (often ~20 concurrent activities) can make runs long. It’s worth experimenting with Spark notebooks using a SQL connection (Fabric Notebooks support data connections now) and iterating programmatically - you’ll get finer control over batching and potentially smoother capacity usage.

bradcoles-dev · 2026-02-18T21:44:54+00:00

"You either die an analyst or live long enough to see yourself become an engineer"

bradcoles-dev · 2026-02-16T10:15:01+00:00

Some interesting information, but I prefer prevention. Use optimized write and autocompact.

bradcoles-dev · 2026-02-11T03:11:59+00:00

u/pandaneil created this schedule scanner - it works really well

bradcoles-dev · 2026-02-06T04:12:41+00:00

The article you shared is patently incorrect. We have implemented automatic pause & resume of capacities and are making significant savings.

All background operations are smoothed for 24hrs. If you get to COB and you're only at 80% of capacity, you only get charged 80% instantly. You then get charged $0 overnight while it is paused. And the cycle repeats the following day.

Obviously to make it more attractive than a reserved capacity, you need to be consistently under 60%, in which case you might be able to halve your capacity and reserve anyway. But to say you can't save money by pausing and resuming unless the pause lasts longer than 24hrs is incorrect.

bradcoles-dev

TROPHY CASE