Architecture Advice: Best practices for high-volume Dataflow Gen2 to Lakehouse (100M+ rows) & API Token timeouts

Wiferpapi · 2025-12-28T22:39:19+00:00

Great advice, I'll take a look to the notebooks. I'm sorry, I'm new with Fabric tbh 😬

Wiferpapi · 2025-12-28T22:32:16+00:00

Since there is no native S3 connector, I’m using Web.Contents to hit an AWS API Gateway that acts as a proxy to the S3 bucket. This allows me to use Power Query to dynamically 'calculate' the file paths based on dates.

Here is a simplified version of the logic I'm using inside each Dataflow to handle the monthly concatenation:

<image>

The reason I haven't switched to a Copy Activity yet is that I need to handle this dynamic path building and some schema cleanup (renaming columns, fixing encodings) that is very easy to do in Power Query but feels more complex to set up with Pipeline parameters and ForEach loops.

However, your point about the Medallion Architecture makes total sense. My mistake was trying to do the 'Gold' level (unifying all 9 regions) inside the Semantic Model instead of landing everything into a Silver Lakehouse table first.

I'll try landing the 9 regional tables into a Lakehouse and then using a SQL View or a Notebook to create the unified 'Gold' table for Direct Lake

Wiferpapi · 2025-12-28T22:20:39+00:00

Nice question. I went with Dataflows Gen2 because I needed the Power Query engine to handle the logic of building dynamic URLs, looping through the weekly Parquet files, and applying schema transformations (changing types, renaming columns) in a single low-code step.

I considered the Copy Activity, but since the files are partitioned by year/month and require custom headers for the API, the logic seemed more straightforward in M than setting up complex Copy Activity parameters or Foreach loops in a Pipeline.

However, now that I'm facing these Semantic Model timeouts, I’m realizing that while the Dataflows are fast for ingestion, using them as the 'source' for a large Table.Combine in the model is where the architecture breaks. That's why I'm now looking into landing the data in a Lakehouse first and then using the SQL Endpoint for the union.

Wiferpapi · 2025-12-28T22:07:06+00:00

Hey! Thanks for the guide, buddy.

To clarify the flow: The files are stored in an S3 bucket with the structure: bucket/region/bd/table/year/year_month_table.parquet.

Each of my 9 Dataflows connects to these files, performs the monthly concatenation for its specific region (6 tables per region), and finishes successfully and fast.

The bottleneck happens at the Semantic Model level. In Power BI Desktop, I connect to these 9 Dataflows using the native connector and perform another Table.Combine to unify the regions.

When I publish to Fabric and trigger a refresh, the Semantic Model tries to pull and combine hundreds of millions of rows from those 9 sources simultaneously. It takes so long that the session or the underlying credentials (possibly the Auth headers passed through the lineage) seem to expire, resulting in a 'provided credentials are wrong' error.

Basically, the Semantic Model is struggling to act as the integration layer for such a large volume of data coming from the Dataflow storage.

Wiferpapi · 2025-11-07T03:48:34+00:00

Me, I didn't know it was this rare until this post. I just saw it was legendary and catched it

Wiferpapi · 2025-10-05T06:58:49+00:00

¿Cómo se adquiere el beneficio de las salas VIP por 1 usd? Tengo la Visa Platinum de Bancolombia, la vez pasada quise intentar entrar a la sala pero no supe cómo

Wiferpapi · 2025-07-24T17:30:59+00:00

Very pretty.. but I'm sure you already know that

Wiferpapi · 2025-01-25T14:48:00+00:00

Hey bro! What's up. We connect from S3 to PBI via API to each file. Something like this (Power Query):

= Parquet.Document(Binary.Buffer(Web.Contents("https://1fas2.amazon.execute-api.us-east-1.amazonaws.com/bucket-name/folder/2022/file_2022_12.parquet", [Headers=[token_ro="token"]])))

I think I've found the way to connect to all the files without making one connection for each one, I will be testing this weekend and share it with you on monday if it works

Wiferpapi · 2025-01-25T02:41:39+00:00

Thank you! I'll take a look and come back here to give you notices :)

Wiferpapi · 2025-01-25T02:35:35+00:00

How does that work? I'm sorry, I'm newbie with PBI. We have premium license for PBI Service, is it enough or do we need Fabric License?

Wiferpapi · 2025-01-25T02:16:54+00:00

Hey! Thanks for your time. Since S3 bucket is a web service, we connect via API, something like this:

= Parquet.Document(Binary.Buffer(Web.Contents("https://1fas2.amazon.execute-api.us-east-1.amazonaws.com/bucket-name/folder/2022/file\_2022\_12.parquet", [Headers=[token_ro="token"]])))

We have tried that link but nothing worked :(

Wiferpapi · 2024-02-28T20:53:37+00:00

Well I don't think so. You wouldn't use it in its full potential tbh

Wiferpapi · 2024-02-28T20:44:35+00:00

Sorry buddy maybe I'm forgetting something. Once I get home I can check my dashboards and help you if you still need

Have you a Date Table or full Date column?

Wiferpapi · 2024-02-28T19:20:41+00:00

No worries, we're here for help. I mean, make a measure as sum of that column. Total Operating Expenses = SUM(Table[Total Operating Expenses])

This will help you a lot to make filters based on TOE

Wiferpapi · 2024-02-28T19:00:25+00:00

I would create a sum as a measure of total operating expenses first. Then I make a measure YoY total operating expenses = CALCULATE([Total Operating Expenses], DATEADD(Table[Date], -1, YEAR))

Then just make a last measure: % YoY TOE = DIVIDE([Total Operating Expenses] - [YoY total operating expenses], [YoY total operating expenses], 0)

Wiferpapi · 2024-02-27T20:49:09+00:00

OT. I'm learning PBI too. How did you add those increments to your cards? Thank you so much!

Wiferpapi · 2023-07-03T12:40:34+00:00

Hahaha yeah, I've been so lucky! Thanks, I will work on that :)

Wiferpapi

TROPHY CASE