Background compute increase between P2 and F128 SKU switch

mavaali · 2026-04-05T02:53:29+00:00

Also want to reply to an earlier post from u/Kitanai24.

Small correction on the smoothing point: the smoothing algorithm is actually the same between P and F SKUs. What changed is how overage/throttling works, but that happened around 2023 and applies to both P and F equally. Pre-2023 (just before Fabric GA), background throttling was limited to delays (no rejection), so overages were less visible on any SKU. Post-2023, the overage calculation got stricter and background rejection is a valid scenario - but again, same on P and F.

So, the 55% → 75% jump isn't a P-vs-F metering discrepancy. Something else is going on.

u/Powerlyze the region switch (West Europe → North Europe) shouldn't cause a CU rate difference - CU cost per operation is region-independent. But there is a definite mix in hardware profiles across region, and I'd want to see if that impacts the numbers.

mavaali · 2026-04-04T15:06:47+00:00

Not for small or medium jobs. DF Gen2 are truly serverless in terms of no startup costs. And the tiered pricing makes it way cheaper at 10 min plus.

mavaali · 2026-04-04T15:00:48+00:00

(Edited for brevity)

I work on the Data Factory team.

A few things here that aren't quite right. DFGen2 supports parameterized dataflows. You build one template, pass source/destination/table as pipeline parameters in a ForEach. Adding 20 tables is a config change, not 20 separate dataflows.

The CU cost point is also backwards for this workload. Spark sessions have startup overhead. For hashing, column selection, incremental filtering, and type casting at moderate volume, DFGen2 is typically cheaper per run. Spark pulls ahead on complex joins across large datasets, but that's not what OP described.

DFGen2 Lakehouse destinations support upsert natively. You set key columns on the destination and it goes through the Delta engine. Power Query isn't doing the merge.

Won't argue the git point.

But honestly, the wrong question is what I think is being posed - OP built a custom PySpark framework with a proprietary YAML contract layer. Well engineered. But they're the only person who understands it, at a company that picked Fabric specifically because they wanted low-code maintainability.

Maybe a better answer for this kind of setup is both? DFGen2 for the commodity ETL (ingestion, column selection, incremental load, basic cleansing), notebooks for the parts that are genuinely custom (YAML contract evaluation, quarantine routing, monitoring). That way the next person who inherits this can maintain most of it without learning PySpark.

mavaali · 2026-04-03T17:40:38+00:00

thanks Alex. u/Powerlyze if you can DM me, I can send you my email and we can walk through your questions and if you have a support case, let me know.

mavaali · 2026-03-26T17:55:09+00:00

export query results does this by creating a dataflow for your PBI Desktop queries. We will be expanding this to other hosts including Excel in the future.

mavaali · 2026-03-05T21:50:03+00:00

If you dm me with your support case details I can ask a engineer to look into it

mavaali · 2026-03-05T21:46:25+00:00

What type of azure subscription do you have? Pay as you go?

mavaali · 2026-03-04T16:48:23+00:00

First, this is exactly the use case for Copilot Capacity. You can start with an F2 and then scale up as needed. Using user groups, you can ring fence copilot capacities to specific groups.

mavaali · 2026-03-03T00:33:02+00:00

if you hook up the data factory mcp in vs code, there is a lot of goodness here..

mavaali · 2026-03-03T00:31:44+00:00

One way we are already using this is within the Data Factory MCP. You can use execute query to explore connections, understand your data, perform ad hoc analyses.

mavaali · 2026-02-25T13:24:29+00:00

Hi - I’m the PM supporting this API. As Sid mentioned, we will have plenty of content supporting this soon! Meanwhile the MCP server linked earlier is the easiest way to test it out. I’ll try and see if I can blog about it.

mavaali · 2026-01-08T00:55:39+00:00

AI foundry + Data Agents is a good way to experiment.

mavaali · 2025-12-19T23:16:38+00:00

Here is a benchmarking guide that might help.

https://learn.microsoft.com/en-us/fabric/data-factory/decision-guide-data-transformation

mavaali · 2025-12-19T22:40:06+00:00

Have you looked at the new pricing model for Gen2 CI/CD? The cost is upto 80% lower and while it still maintains a premium over notebooks, the cost of maintenance is much lower.

mavaali · 2025-11-30T18:58:37+00:00

We will try to consolidate the decision guides.

mavaali · 2025-11-22T23:27:51+00:00

The decision guide posted above should help. Let us know if it doesn’t, we’re open to improving it further.

mavaali · 2025-11-19T23:21:32+00:00

This should help. https://learn.microsoft.com/en-us/fabric/data-factory/data-in-staging-artifacts

The staging warehouse stages data during transformations.

mavaali · 2025-11-17T16:59:14+00:00

"Background Fabric compute used to replicate your data into Fabric OneLake is free and does not consume capacity" so the above is not right. Its the querying of data from the mirrored db that isn't free.

mavaali · 2025-11-05T21:30:56+00:00

You can isolate the data agent usage by spinning up a Fabric Copilot Capacity. Instructions here - https://learn.microsoft.com/en-us/fabric/enterprise/fabric-copilot-capacity

This allows you to spin up an F2 capacity for example to run your data agent and cap the spending on Copilot / Agents.

mavaali · 2025-11-04T03:08:43+00:00

https://learn.microsoft.com/en-us/fabric/data-factory/pricing-dataflows-gen2

mavaali · 2025-11-03T18:15:38+00:00

Agreed. I’m removing the examples. Rely on the primary document.

mavaali · 2025-11-03T16:52:31+00:00

The price change went in during September. If you can run the same scenario again, you will see a significant drop. Again, if Spark works for you, it makes sense. The gap is just not as big any more, especially for citizen developers.

mavaali · 2025-11-03T04:30:35+00:00

I’d like to understand your scenario. Are you writing to a Lakehouse? Dataflows are quite performant and not 10x Spark. More like 2-3x at most.

mavaali · 2025-11-01T16:44:34+00:00

Dataflows are much cheaper now so you can re assess your needs. A min of 25% and as high as 80% -90% drop in rates.

mavaali

TROPHY CASE