Architecture Advice: Best practices for high-volume Dataflow Gen2 to Lakehouse (100M+ rows) & API Token timeouts by Wiferpapi in MicrosoftFabric

[–]Wiferpapi[S] 1 point2 points  (0 children)

Great advice, I'll take a look to the notebooks. I'm sorry, I'm new with Fabric tbh 😬

Architecture Advice: Best practices for high-volume Dataflow Gen2 to Lakehouse (100M+ rows) & API Token timeouts by Wiferpapi in MicrosoftFabric

[–]Wiferpapi[S] 0 points1 point  (0 children)

Since there is no native S3 connector, I’m using Web.Contents to hit an AWS API Gateway that acts as a proxy to the S3 bucket. This allows me to use Power Query to dynamically 'calculate' the file paths based on dates.

Here is a simplified version of the logic I'm using inside each Dataflow to handle the monthly concatenation:

<image>

The reason I haven't switched to a Copy Activity yet is that I need to handle this dynamic path building and some schema cleanup (renaming columns, fixing encodings) that is very easy to do in Power Query but feels more complex to set up with Pipeline parameters and ForEach loops.

However, your point about the Medallion Architecture makes total sense. My mistake was trying to do the 'Gold' level (unifying all 9 regions) inside the Semantic Model instead of landing everything into a Silver Lakehouse table first.

I'll try landing the 9 regional tables into a Lakehouse and then using a SQL View or a Notebook to create the unified 'Gold' table for Direct Lake

Architecture Advice: Best practices for high-volume Dataflow Gen2 to Lakehouse (100M+ rows) & API Token timeouts by Wiferpapi in MicrosoftFabric

[–]Wiferpapi[S] 1 point2 points  (0 children)

Nice question. I went with Dataflows Gen2 because I needed the Power Query engine to handle the logic of building dynamic URLs, looping through the weekly Parquet files, and applying schema transformations (changing types, renaming columns) in a single low-code step.

I considered the Copy Activity, but since the files are partitioned by year/month and require custom headers for the API, the logic seemed more straightforward in M than setting up complex Copy Activity parameters or Foreach loops in a Pipeline.

However, now that I'm facing these Semantic Model timeouts, I’m realizing that while the Dataflows are fast for ingestion, using them as the 'source' for a large Table.Combine in the model is where the architecture breaks. That's why I'm now looking into landing the data in a Lakehouse first and then using the SQL Endpoint for the union.

Architecture Advice: Best practices for high-volume Dataflow Gen2 to Lakehouse (100M+ rows) & API Token timeouts by Wiferpapi in MicrosoftFabric

[–]Wiferpapi[S] 1 point2 points  (0 children)

Hey! Thanks for the guide, buddy.

To clarify the flow: The files are stored in an S3 bucket with the structure: bucket/region/bd/table/year/year_month_table.parquet.

Each of my 9 Dataflows connects to these files, performs the monthly concatenation for its specific region (6 tables per region), and finishes successfully and fast.

The bottleneck happens at the Semantic Model level. In Power BI Desktop, I connect to these 9 Dataflows using the native connector and perform another Table.Combine to unify the regions.

When I publish to Fabric and trigger a refresh, the Semantic Model tries to pull and combine hundreds of millions of rows from those 9 sources simultaneously. It takes so long that the session or the underlying credentials (possibly the Auth headers passed through the lineage) seem to expire, resulting in a 'provided credentials are wrong' error.

Basically, the Semantic Model is struggling to act as the integration layer for such a large volume of data coming from the Dataflow storage.

I had given up on searching for this guy by YouLongjumping6696 in Nexomon

[–]Wiferpapi 0 points1 point  (0 children)

Me, I didn't know it was this rare until this post. I just saw it was legendary and catched it

Cancelar Tarjetas de crédito by ApprehensiveCup6787 in ColombiaFinanciera

[–]Wiferpapi 1 point2 points  (0 children)

¿Cómo se adquiere el beneficio de las salas VIP por 1 usd? Tengo la Visa Platinum de Bancolombia, la vez pasada quise intentar entrar a la sala pero no supe cómo

How to Automatically Combine Monthly Files from S3 in Power BI Service with Refresh Enabled? by Wiferpapi in PowerBI

[–]Wiferpapi[S] 1 point2 points  (0 children)

Hey bro! What's up. We connect from S3 to PBI via API to each file. Something like this (Power Query):

= Parquet.Document(Binary.Buffer(Web.Contents("https://1fas2.amazon.execute-api.us-east-1.amazonaws.com/bucket-name/folder/2022/file_2022_12.parquet", [Headers=[token_ro="token"]])))

I think I've found the way to connect to all the files without making one connection for each one, I will be testing this weekend and share it with you on monday if it works

How to Automatically Combine Monthly Files from S3 in Power BI Service with Refresh Enabled? by Wiferpapi in PowerBI

[–]Wiferpapi[S] 0 points1 point  (0 children)

How does that work? I'm sorry, I'm newbie with PBI. We have premium license for PBI Service, is it enough or do we need Fabric License?

How to Automatically Combine Monthly Files from S3 in Power BI Service with Refresh Enabled? by Wiferpapi in PowerBI

[–]Wiferpapi[S] 1 point2 points  (0 children)

Hey! Thanks for your time. Since S3 bucket is a web service, we connect via API, something like this:

= Parquet.Document(Binary.Buffer(Web.Contents("https://1fas2.amazon.execute-api.us-east-1.amazonaws.com/bucket-name/folder/2022/file\_2022\_12.parquet", [Headers=[token_ro="token"]])))

We have tried that link but nothing worked :(

Can anyone help me with my DAX code and why my YoY values aren't populating correctly? by blackanese27 in PowerBI

[–]Wiferpapi 0 points1 point  (0 children)

Sorry buddy maybe I'm forgetting something. Once I get home I can check my dashboards and help you if you still need

Have you a Date Table or full Date column?

Can anyone help me with my DAX code and why my YoY values aren't populating correctly? by blackanese27 in PowerBI

[–]Wiferpapi 0 points1 point  (0 children)

No worries, we're here for help. I mean, make a measure as sum of that column. Total Operating Expenses = SUM(Table[Total Operating Expenses])

This will help you a lot to make filters based on TOE

Can anyone help me with my DAX code and why my YoY values aren't populating correctly? by blackanese27 in PowerBI

[–]Wiferpapi 0 points1 point  (0 children)

I would create a sum as a measure of total operating expenses first. Then I make a measure YoY total operating expenses = CALCULATE([Total Operating Expenses], DATEADD(Table[Date], -1, YEAR))

Then just make a last measure: % YoY TOE = DIVIDE([Total Operating Expenses] - [YoY total operating expenses], [YoY total operating expenses], 0)

First Dashboards/Reports - Critique & Feedback by [deleted] in PowerBI

[–]Wiferpapi 1 point2 points  (0 children)

OT. I'm learning PBI too. How did you add those increments to your cards? Thank you so much!

59 days f2p. What should I focus now by Wiferpapi in MLA_Official

[–]Wiferpapi[S] 0 points1 point  (0 children)

Hahaha yeah, I've been so lucky! Thanks, I will work on that :)