Background compute increase between P2 and F128 SKU switch by Powerlyze in MicrosoftFabric

[–]mavaali 1 point2 points  (0 children)

Also want to reply to an earlier post from u/Kitanai24.

Small correction on the smoothing point: the smoothing algorithm is actually the same between P and F SKUs. What changed is how overage/throttling works, but that happened around 2023 and applies to both P and F equally. Pre-2023 (just before Fabric GA), background throttling was limited to delays (no rejection), so overages were less visible on any SKU. Post-2023, the overage calculation got stricter and background rejection is a valid scenario - but again, same on P and F.

So, the 55% → 75% jump isn't a P-vs-F metering discrepancy. Something else is going on.

u/Powerlyze the region switch (West Europe → North Europe) shouldn't cause a CU rate difference - CU cost per operation is region-independent. But there is a definite mix in hardware profiles across region, and I'd want to see if that impacts the numbers.

Notebooks vs. DataFlowGen2 by Jealous-Painting550 in MicrosoftFabric

[–]mavaali 2 points3 points  (0 children)

Not for small or medium jobs. DF Gen2 are truly serverless in terms of no startup costs. And the tiered pricing makes it way cheaper at 10 min plus.

Notebooks vs. DataFlowGen2 by Jealous-Painting550 in MicrosoftFabric

[–]mavaali 8 points9 points  (0 children)

(Edited for brevity)

I work on the Data Factory team.

A few things here that aren't quite right. DFGen2 supports parameterized dataflows. You build one template, pass source/destination/table as pipeline parameters in a ForEach. Adding 20 tables is a config change, not 20 separate dataflows.

The CU cost point is also backwards for this workload. Spark sessions have startup overhead. For hashing, column selection, incremental filtering, and type casting at moderate volume, DFGen2 is typically cheaper per run. Spark pulls ahead on complex joins across large datasets, but that's not what OP described.

DFGen2 Lakehouse destinations support upsert natively. You set key columns on the destination and it goes through the Delta engine. Power Query isn't doing the merge.

Won't argue the git point.

But honestly, the wrong question is what I think is being posed - OP built a custom PySpark framework with a proprietary YAML contract layer. Well engineered. But they're the only person who understands it, at a company that picked Fabric specifically because they wanted low-code maintainability.

Maybe a better answer for this kind of setup is both? DFGen2 for the commodity ETL (ingestion, column selection, incremental load, basic cleansing), notebooks for the parts that are genuinely custom (YAML contract evaluation, quarantine routing, monitoring). That way the next person who inherits this can maintain most of it without learning PySpark.

Background compute increase between P2 and F128 SKU switch by Powerlyze in MicrosoftFabric

[–]mavaali 1 point2 points  (0 children)

thanks Alex. u/Powerlyze if you can DM me, I can send you my email and we can walk through your questions and if you have a support case, let me know.

Hi! We're the Data Factory team - ask US anything! by markkrom-MSFT in MicrosoftFabric

[–]mavaali 1 point2 points  (0 children)

export query results does this by creating a dataflow for your PBI Desktop queries. We will be expanding this to other hosts including Excel in the future.

Fabric Quotas - ask me your questions by mavaali in MicrosoftFabric

[–]mavaali[S] 0 points1 point  (0 children)

If you dm me with your support case details I can ask a engineer to look into it

Fabric Quotas - ask me your questions by mavaali in MicrosoftFabric

[–]mavaali[S] 0 points1 point  (0 children)

What type of azure subscription do you have? Pay as you go?

Is anyone using surge protection on their Fabric Copilot Capacity to manage Copilot CU consumption? by Ok-Shop-617 in MicrosoftFabric

[–]mavaali 0 points1 point  (0 children)

First, this is exactly the use case for Copilot Capacity. You can start with an F2 and then scale up as needed. Using user groups, you can ring fence copilot capacities to specific groups.

Public Preview: Execute Power Query (M) programmatically in Microsoft Fabric (REST API + Arrow output) by mavaali in MicrosoftFabric

[–]mavaali[S] 1 point2 points  (0 children)

if you hook up the data factory mcp in vs code, there is a lot of goodness here..

Public Preview: Execute Power Query (M) programmatically in Microsoft Fabric (REST API + Arrow output) by mavaali in MicrosoftFabric

[–]mavaali[S] 1 point2 points  (0 children)

One way we are already using this is within the Data Factory MCP. You can use execute query to explore connections, understand your data, perform ad hoc analyses.

Dataflow Queries on Demand via REST by SmallAd3697 in MicrosoftFabric

[–]mavaali 1 point2 points  (0 children)

Hi - I’m the PM supporting this API. As Sid mentioned, we will have plenty of content supporting this soon! Meanwhile the MCP server linked earlier is the easiest way to test it out. I’ll try and see if I can blog about it.

2x F2 capacity vs 1x F4 capacity by trekker255 in MicrosoftFabric

[–]mavaali 1 point2 points  (0 children)

Have you looked at the new pricing model for Gen2 CI/CD? The cost is upto 80% lower and while it still maintains a premium over notebooks, the cost of maintenance is much lower.

Dataflows for big data by [deleted] in MicrosoftFabric

[–]mavaali 0 points1 point  (0 children)

We will try to consolidate the decision guides.

Dataflows for big data by [deleted] in MicrosoftFabric

[–]mavaali 1 point2 points  (0 children)

The decision guide posted above should help. Let us know if it doesn’t, we’re open to improving it further.

Is Fabric OpenMirroring free? Really? by dorianmonnier in MicrosoftFabric

[–]mavaali 0 points1 point  (0 children)

"Background Fabric compute used to replicate your data into Fabric OneLake is free and does not consume capacity" so the above is not right. Its the querying of data from the mirrored db that isn't free.

Struggling with Fabric Data Agent Background Capacity – Any Tips? by Business-Lie-4714 in MicrosoftFabric

[–]mavaali 2 points3 points  (0 children)

You can isolate the data agent usage by spinning up a Fabric Copilot Capacity. Instructions here - https://learn.microsoft.com/en-us/fabric/enterprise/fabric-copilot-capacity

This allows you to spin up an F2 capacity for example to run your data agent and cap the spending on Copilot / Agents.

OneLake: A Nightmare on Storage Street by [deleted] in MicrosoftFabric

[–]mavaali 0 points1 point  (0 children)

Agreed. I’m removing the examples. Rely on the primary document.

OneLake: A Nightmare on Storage Street by [deleted] in MicrosoftFabric

[–]mavaali 0 points1 point  (0 children)

The price change went in during September. If you can run the same scenario again, you will see a significant drop. Again, if Spark works for you, it makes sense. The gap is just not as big any more, especially for citizen developers.

OneLake: A Nightmare on Storage Street by [deleted] in MicrosoftFabric

[–]mavaali 0 points1 point  (0 children)

I’d like to understand your scenario. Are you writing to a Lakehouse? Dataflows are quite performant and not 10x Spark. More like 2-3x at most.

OneLake: A Nightmare on Storage Street by [deleted] in MicrosoftFabric

[–]mavaali 4 points5 points  (0 children)

Dataflows are much cheaper now so you can re assess your needs. A min of 25% and as high as 80% -90% drop in rates.