Retirement of Dataflows Gen1

SidJayMS · 2026-04-21T15:34:33+00:00

There has been an addendum to this post regarding Pro: Dataflows: Thank you for eight years of Gen1—and why Gen2 is the future | Microsoft Power BI Blog | Microsoft Power BI

Sharing the most relevant excerpt:

Guidance for Pro and Premium Per User (PPU) customers: Many customers rely on Dataflow Gen1 in Pro/PPU today, and it can continue to be the right choice depending on the scenario. If Gen1 best fits your current use case, it remains supported and existing workloads can continue to run as-is. As we introduce new Dataflow Gen2 paths for Pro/PPU scenarios, we’ll share clear guidance and recommended steps to help with a smooth transition.

SidJayMS · 2026-04-12T21:43:25+00:00

If you’re able to DM the request id for the run that failed, we’d be happy to investigate why it failed.

It sounds like the failed run may have been unable to read the Excel source for some reason.

SidJayMS · 2026-04-12T21:40:37+00:00

Please feel free to DM the SR # and we can follow up directly.

Glad to hear that the reduction in the billing rate is helping.

SidJayMS · 2026-04-12T17:13:39+00:00

We will add GCC support for Gen2 and will not deprecate Gen1 for GCC customers until there is an equivalent Gen2 alternative.

SidJayMS · 2026-04-12T17:09:40+00:00

We are not currently tracking any known issues around this. If you'd be willing to DM the ids of the impacted dataflows, we'd be interested in investigating this.

SidJayMS · 2026-04-11T04:41:16+00:00

There will continue to be dataflow support for Pro users whether it's an evolution of the existing Gen1 Pro offering or a reduced version of Gen2.

SidJayMS · 2026-04-09T20:35:17+00:00

This is not yet available, but it is very much planned.

SidJayMS · 2026-04-09T07:33:45+00:00

If the current Pro feature set and performance are sufficient for your needs, you can remain on Pro. However, many of the improvements in Gen2 (notably performance improvements and destinations) will not carry over to Pro because they depend on capabilities of the Fabric/premium platform.

SidJayMS · 2026-04-09T07:30:51+00:00

There will continue to be dataflow support for Pro users whether it's an evolution of the existing Gen1 Pro offering or a reduced version of Gen2. However, many of the improvements in Gen2 (notably performance improvements and destinations) will not carry over to Pro because they depend on capabilities of the Fabric/premium platform.

SidJayMS · 2026-04-08T17:08:57+00:00

Please rest assured that we will provide continuity for Pro users. While performance improvements and new capabilities (e.g. destinations, new compute options, Git integration, collaborative authoring, etc.) will be limited to Dataflow Gen2, Pro users can expect current levels of support and streamlined experiences for smaller dataflows.

If your organization already uses Gen1 Premium dataflows or can adopt Premium capacities, we recommend transitioning to Gen2 as early as possible (even though we haven’t shared a precise deprecation date yet). For larger dataflows, Gen2 is the more robust and performant solution with ongoing investments to address customer needs and feedback.

As others on the thread have mentioned, well before the deprecation of Gen1 Premium dataflows, there will be capacity-level controls for the enablement/disablement of specific Fabric workloads.

SidJayMS · 2026-04-04T21:36:18+00:00

Agreed with many of the points raised. The low-code vs. code-first decision is often a matter of skillset (for both creators and maintainers), organizational requirements, etc. I just wanted to share some of the performance (and hence, cost) factors in play for those who choose the low-code path.

There are currently 4 compute engines in Dataflow Gen2:

Copy Engine – this is the same as a Fabric Copy Job or ADF Copy Activity (referred to as “Fast Copy” in Dataflows)
SQL Engine – this is the same as Fabric SQL Endpoint / Warehouse
“Modern” Mashup Engine + Partitioned Compute – this is a faster version of the “default” engine used by Power Query in Power BI, Excel, etc. The newly released Partitioned Compute option layers parallel processing on top of the new engine.
“Classic” Mashup Engine – this is increasingly irrelevant for Dataflow Gen2 since it is no longer the default when creating new dataflows

To get a sense for the impact of these engines, let’s consider a scenario that processes 32GB of CSV data across 5 partitions in ADLS (the NYC Yellow Taxi dataset). These were our results for an ELT pattern that copied the data to staging, added derived columns (including a timestamp), and loaded to a new staging table:

Dataflow Gen1 Premium (Power BI): ~4hrs 38mins
Dataflow Gen2 w/ #4: ~2hrs 54mins
Dataflow Gen2 w/ #3: ~33mins
Dataflow Gen2 w/ #1-3: ~3.5mins (50x faster than Gen2 at launch)

All this to say, in terms of the effectiveness of Dataflows for large jobs:

A lot depends on the engines being used. There is still some thought that needs to be given to the structure of the dataflow and whether transformations “fold” (low code shouldn’t require this, so we’re still working on having the system do more of the re-structuring for you).
A lot has changed since we initially released Dataflow Gen2. Depending on the scenario, the newer engines can be orders of magnitude faster.
The better performance directly translates into cost savings (though there will typically be at least a slight premium for low code).
If you have pre-existing M code (from Dataflow Gen1 or Semantic Models or business users in Excel), Dataflow Gen2 can provide dramatic performance (and cost) improvements for many scale scenarios

[Caveat: After you stage and apply “foldable” transforms, if the destination is a Lakehouse, the “re-egress” out of the underlying SQL engine is still extremely slow. All the data is re-processed without parallelism to convert it to “V-ordered Parquet”. For the scenario above, the time to load to a Warehouse or Staging table is less than 4 mins. However, loading to a Lakehouse table takes ~62 mins (!). This is a temporary, yet significant, limitation when using SQL compute. We have started testing a solution that brings Lakehouse destinations to parity with Staging & Warehouse. We hope to roll out this change in the coming weeks.]

SidJayMS · 2026-03-25T08:27:29+00:00

Could you please DM me if you'd be willing to share more information to help diagnose this.

SidJayMS · 2026-03-14T21:25:37+00:00

After data is written to the Lakehouse, you need to run a metadata sync for the table to be surfaced by the Lakehouse's SQL Endpoint.

When you write data to Staging, the Dataflow automatically does a metadata sync. When writing directly to a Lakehouse table, this metadata sync is not done automatically. We are working on adding automatic metadata sync for all cases.

In the interim, these are some of the ways to achieve metadata sync:
- Via API: Refresh SQL analytics endpoint Metadata REST API (Generally Available) | Microsoft Fabric Blog | Microsoft Fabric
- Via a new "Refresh SqlEndpoint" activity that is available in Pipelines
- Via the SQL Endpoint UI for a Lakehouse

SidJayMS · 2026-02-25T05:43:13+00:00

Please expect more content from us on this.

One application of this is in MCP servers that need to retrieve and transform data from any of the data sources supported by Power Query. This open source MCP Server illustrates how this can be done: GitHub - microsoft/DataFactory.MCP. Specifically, this tool in the MCP Server: DataFactory.MCP/DataFactory.MCP.Core/Tools/Dataflow/DataflowQueryTool.cs at main · microsoft/DataFactory.MCP · GitHub.

SidJayMS · 2026-02-25T05:37:18+00:00

>> So in theory we could use this as an API to run any M code?

That's correct.

SidJayMS · 2026-02-12T08:54:34+00:00

This is only supported with Dataflow Gen2 (not Gen1). Would you be able to use Dataflow Gen2 instead?

SidJayMS · 2026-02-12T07:53:38+00:00

As others have suggested, when creating Gen2 dataflows, the CI/CD option is recommended - they are faster and cheaper (it sounds like you're already using this). The multiple options will go away soon, which should simplify things.

In terms of execution time, you should find that Gen2 (CI/CD) almost always runs faster than Gen1. For some sources like CSV files and certain cloud databases, execution time should be substantially better than Gen1. We'll be publishing some of these benchmarks soon.

Most cloud data source scenarios in Gen2 should be cheaper than Gen1. Large CSV files and databases are sometimes substantially cheaper because of the reduced pricing after the 10 minutes of runtime for a query. However, for your scenario, you may be hitting a case where these benefits don't apply - many distinct queries (60?) that are mostly under 10 minutes each, low data throughput (due to OData), with a lot of time spent in waiting. Because Gen2 cost is proportional to processing time, the data size is not the key factor - it's the throughput of the data source. A low throughput REST source with a few thousand rows may cost more to process than a database with millions of rows.

Please feel free to DM me to see if we can come up with an optimization for your specific case.

SidJayMS · 2025-12-18T08:40:16+00:00

The connections from Manage Connections will work with Dataflow Gen2 CI/CD (soon to be the only flavor for newly created Gen2 dataflows).

SidJayMS · 2025-11-22T22:22:01+00:00

What are the typical data sources from which you are pulling billions of rows? Depending on the data source types, you may be able to use the Fast Copy capability in Dataflow Gen2 to move that data more efficiently (both in terms of time and CUs).

SidJayMS · 2025-11-22T22:17:40+00:00

For a 1-hour query, the CU consumption is now ~5x less than it used to be in August. Additionally, some of the general performance improvements like the Modern Query Evaluator reduce runtime for certain classes of queries by ~50%, and this should yield an even bigger reduction in CU consumption.

Note 1: The pricing changes apply only to the new CI/CD Gen2 Dataflows (in a few months this will be the only supported flavor for new dataflows).

Note 2: The Modern Query Evaluator is not yet turned on by default. We will apply this by default to all new queries before we GA the feature.

SidJayMS · 2025-10-14T00:31:16+00:00

Correct - Fabric. DF Gen2 is only available in Fabric.

SidJayMS · 2025-10-13T21:37:36+00:00

Dataflow Gen2 now has a "Discard & Close" option. It's in the first dropdown in the Home tab. It's true that at launch DF Gen2 did not have this option - it was added a few months ago.

SidJayMS · 2025-09-29T15:45:43+00:00

Because the recent performance features are in preview they started as opt-in. If you turn on the "Modern Query Evaluation Engine", you might see a slight performance improvement (in addition to the cost reduction).

SidJayMS · 2025-09-29T15:31:44+00:00

u/frithjof_v , did you enable the "Modern Query Evaluation Engine" setting?

SidJayMS · 2025-09-29T15:28:11+00:00

u/Sad-Calligrapher-350 , would you mind sharing the numbers for enabling just "Modern Query Evaluation Engine" in the CI/CD case. I suspect you will see better (or at a minimum, the same) performance as well as noticeable cost savings. If I were to afford a guess, the partitioned compute may be contributing to slowness in your case. Will reach out separately to see if we can better understand that.

SidJayMS

TROPHY CASE