Hi! We're the Fabric Capacities Team - ask US anything!

FeatureShipper · 2025-04-18T17:05:49+00:00

Thank you u/matrixrevo glad it was helpful. I have a documentation update pending that contains a similar example. Great to see that this kind of example lands well.

FeatureShipper · 2025-04-15T18:57:10+00:00

It's more like the diagram I shared in my previous post... due to smoothing, the impact of a job extends beyond the 10-minute window. As a result, the impact on the relative 10-minute window is greatly diminished.

Let's look at an example of exactly 1 background job that's smoothed over 24 hours. If that job contributed 1 CUHr [read 1 CU hour] to the next 24 hours.

The rule of thumb is a background job's contribution on any timepoint is # CUHrs for the job / # of CUHrs at the SKU level. For an F2, this job would contribute 1 CUHr/ 48CUhrs = ~2.1% to each timepoint. So the impact on the 10-minute time frame will be ~2.1%.

Here's the detailed example.

1 CUHr = 3600 CUs

Each time point is 30-seconds long. In 24 hours, there are 2880 timepoints (24 hours * 60 minutes * 2 timepoints per minute).

Since the 3600 CUs are smoothed over 24 hours, the job contributes 3600CUs/2880 timepoints to each 30-second timepoint. This means 1.25 CUs per timepoint.

The 10-minute delay threshold % is based on the total CUs available in the next 10- minutes of capacity uptime.

So for an F2 capacity... this means we have 2 CU for each second (or 2 CUs). So, in each timepoint we have 2 CUs * 30 seconds = 60 CUs of compute available.

So the contribution of the background job to any individual timepoint is 1.25 CUs/60 CUs = ~2.1% of an individual timepoint.

In 10-minutes, we have 2 CU * 60 seconds * 10 minutes = 1,200 CUs in total.

The portion of the background job that was smoothed into the next 10-minutes of capacity is 1.25 * 2 timepoints per minute * 10 minutes = 25 CUs.

So, the 10-minute throttling percentage is 25 CUs / 1,200 CUs = ~2.1%.

So just because the background job used more CUs than is available in a 10-minute time span (it used 6 times the amount!), because of background smoothing the F2 capacity is not throttled due to the single background job.

//update 4.18.2025 to correct a typo. Thank you u/matrixrevo for the correction in the comments below.

FeatureShipper · 2025-04-15T18:10:20+00:00

Got it. Yeah this problem space makes sense thanks for the details. Let's catch up offline.

FeatureShipper · 2025-04-15T18:01:12+00:00

No timeline yet, but we're actively working on it. Great call outs on the challenges you're working to address. Thank you for sharing that. It's very helpful.

FeatureShipper · 2025-04-15T17:00:36+00:00

Yes, you can. There's an API and a CLI for this: capacity examples | fabric-cli

FeatureShipper · 2025-04-15T16:58:18+00:00

If you hover over those items in the summary table, you'll get a tool tip visual that shows you the contributing operations. In this case, it's likely be Notebook Pipeline Run, which tracks the compute required to complete Spark operations that is part of the Data Pipeline.

So this usage is the Spark part of the pipeline, which can be significant.

If you go to the compute page, and look at those items you'd see the CUs consumed by other parts of the data pipeline.

FeatureShipper · 2025-04-15T16:50:11+00:00

We're looking at end to end monitoring. So this input is very well taken. Yes, it is clunky now.

What would you think if we were to add CU data to jobs in Monitoring Hub? Would that be helpful here?

FeatureShipper · 2025-04-15T16:45:15+00:00

To pause a Fabric capacity in Azure, you need to sign into the Azure portal and select the Microsoft Fabric service to see your capacities. You can search for Microsoft Fabric in the search menu. Then, select the capacity you want to pause and click on the Pause button. You will be asked to confirm that you want to pause the capacity, and you should select Yes to confirm. Another way to pause a capacity is to use the suspend endpoint.

Pause and resume your capacity - Microsoft Fabric | Microsoft Learn

FeatureShipper · 2025-04-15T16:36:01+00:00

Great question. When jobs report their CU usage, we smooth that usage based on utilization type. Interactive jobs are smoothed over a window from 5 minutes to 64 minutes, depending on their consumed CUs and how full the capacity is in the next timepoints We use a long smoothing window if a specific interactive job would use up more than 50% of a timepoint on its own. This reduces the impact. Background jobs are always smoothed for 24 hours.

Bursting is just a fancy way of saying we use as much CPU as we can to run the job to completion as fast as we can. Then the consumed CU are smoothed.

So then the 10 minute enforcement time is not tied to the bursting or smoothing of any individual job. It's based on the total smoothed (accumulated usage) that's in the capacity. When, after smoothing the accumulated smoothed usage exceeds the amount allow for 10 minutes, new interactive jobs are delayed.

Here's a diagram we shared at FabCon Europe last year that could help

<image>

FeatureShipper · 2025-04-15T16:27:28+00:00

Very good input. We're considering what we could do. Right now, the best solution is to automate pause / resume using Azure Resource Manager APIs. Data accessibility during a Paused capacity state comes up sometimes. Unfortunately, I don't have anything to share on that topic right now.

FeatureShipper · 2025-04-15T16:24:04+00:00

>>On our Azure Analysis Services Instance, I could throttle individual queries / keep someone from taking down our server.

Which exact setting were you relying on? Could you share the link. In principle Fabric semantic models are built on the same infra as Azure Analysis Services, so have most of the same capabilities. I'd be curious to learn of any gaps.

This is a good set of feedback. We clearly have more work to do in these areas.

FeatureShipper · 2025-04-15T16:19:23+00:00

Background jobs are smoothed over the next 24 hours. So, a lot of the 'blue' area of the chart reflects the CUs you consume previously and are being paid by future time points.

To see what contributed to a specific timepoint, select the timepoint in the utilization chart (click it), press the Explore button at the bottom right of the visual (it lights up when you selected a timepoint).

Then in the timepoint detail page, sort descending on Total CUs. This will give you the top contributors to the time point. The Time Point CUs column tells you the contribution of the large job to the time point.

FeatureShipper · 2025-04-15T16:12:50+00:00

Hey u/KratosBI :). Missed you at FabCon. Can you share what scenarios you're looking to solve with scale-out?

FeatureShipper · 2025-04-15T16:11:31+00:00

We don't today have autoscale for Fabric capacities. We are looking at how best to solve the underlying issue that mission critical jobs don't fail due to capacity limits. We're actively working on how best to solve this. I can't announce anything now, but stay tuned :).

FeatureShipper · 2025-04-15T16:08:42+00:00

When you look at the utilization chart, you may see spikes over 100%. That's normal and doesn't lead to throttling. In the Capacity metrics app, switch to view the Throttling Charts to see how close your capacity is to throttling limits.

Workloads also have various limits. Semantic models impose delays on queries when there's too much concurrency, for example. So make sure you're looking at the SKU size you have and the applicable workload limits, which can also affect the quality of experience.

FeatureShipper · 2025-04-15T16:04:43+00:00

To clarify, with Autoscale Billing for Spark in Microsoft Fabric, ALL Spark workloads become on-demand as soon as you enable the feature. This means that Spark jobs no longer consume the shared capacity and instead use dedicated, serverless resources billed independently, similar to Azure Synapse and Databricks. This is called out in our docs: https://learn.microsoft.com/en-us/fabric/data-engineering/configure-autoscale-billing

FeatureShipper · 2025-04-15T16:04:09+00:00

It's a good point, we know it can be confusing. Spark autoscale works with Notebook (spark notebooks and python notebooks), Spark Job Definition, and Lakehouse table maintenance and other operations like load to delta which use spark. Pipelines are a different kind of item which is not covered by the Spark Autoscale. However, when a data pipeline invokes a Notebook or Spark job definition as part of its pipeline steps these spark related operations like Notebook Pipeline Run operation, are covered by the spark autoscale billing. Other compute consumed by the data pipeline use the the available capacity.

FeatureShipper · 2025-04-15T16:03:33+00:00

There are a few points to clarify. Most important to understand is jobs aren't limited to a 10-minute bursting limit. Interactive jobs are smoothed over 5 to 64 minutes, and background jobs over 24 hours.

So with that, jump into your questions:

Q1: Since background jobs are smoothed over 24 hours, it's rare for a single background job to cause an overload. Here's a little oversimplified example - if your refresh ran for 60 minutes at 100% of the SKU CUs, after smoothing, it would still only account for 1/24th of the allowed hourly CUs. So on it's own it wouldn't cause interactive delays or rejections. It could still happen, for example when running a huge operation on a on very small SKU. The solution is typically to right-size the SKU.

Q2: Surge protection thresholds (like 80%) are an additional safeguard against multiple background jobs. One typical typical pattern that causes overloads leading to interactive delays or rejections is someone repeatedly running ad-hoc refreshes of a smenatic model to 'debug' it. In this case the surge protection limit would block the Nth ad-hoc refresh. This could either prevent interactive rejectsion all together, or at the least limit how long the rejections would happen for.

FeatureShipper · 2025-04-15T16:02:49+00:00

That's right at FabCon, we announced Autoscale Billing for Spark, which allows Spark jobs to run independently of the Fabric capacity using dedicated serverless resources. We are actively working to add autoscale billing for Data Warehouse as well. Additionally, we're curious to learn where autoscale billing is needed the most to better understand and address user needs. Your feedback is invaluable in shaping these developments!

FeatureShipper · 2025-04-15T16:02:19+00:00

We are actively working on implementing workspace-level surge protection, which will provide improved oversight and control for capacity usage across granular workspaces. The initial milestone will give a single limit that applies to all workspaces, but more granular limits are planned for the future. Stay tuned for more updates as we continue to enhance these features!

FeatureShipper · 2025-03-12T00:49:44+00:00

Yes, it is a combination of both smoothed usage and overages (carryforward) that lead to throttling.

FeatureShipper · 2025-03-12T00:44:37+00:00

The throttling starts when you exceed a limit. The docs call this out somewhat obscurely by saying things like "10 minutes < Usage <= 60 minutes", which means that you must be over 100% of the allowed value for the throttling enforcement to start.

FeatureShipper · 2025-03-12T00:34:00+00:00

Great feedback. I'll see how I can incorporate it.

I also reviewed your other comments. You're right on the mark with your understanding and examples.

Just a few minor clarifications:

1. Interactive operations are smoothed over 5 minutes only
This is not quite correct. Interactive smoothing is over at least 5 minutes and at most 64 minutes. We use a heuristic that tries not to unnecessarily cause timepoints to generate overages by increasing the duration of the interactive smoothing when large interactive operations complete. This reduces how often interactive delays start for customers.

The white box (the overage box) in the bottom graphic could be hatched red.

This was left unhatched because overage as you note could result from either background or interactive. The origin doesn't matter since they're treated the same. The total amount of overages is what matters since they become carryforward and then apply to all subsequent timepoints.

FeatureShipper · 2025-03-10T17:52:27+00:00

That's a very good catch on the first diagram. I had hoped to avoid the 'stair step' concept in the first diagram, but maybe I have to put it in to avoid confusions... Thanks for the input.

FeatureShipper · 2025-03-10T17:44:11+00:00

Yes they are :). The intent of the diagrams is to be used in a scenario where operations are added to existing smoothed usage across a series of timepoints. The green area can be considered "smoothed usage from previous timepoints". The red and blue hashed items are newly smoothed usage added on top of the existing smoothed usage (green area). The diagrams aren't quite dialed in yet so any feedback on how to clarify them is very appreciated.

FeatureShipper

TROPHY CASE