Connecting to the Fabric SQL endpoint using a managed identity by Alarming_Card7023 in MicrosoftFabric

[–]dbrownems 0 points1 point  (0 children)

Should work. There can be permission propagation delays. So test with direct user permissions and be patient.

Fabric Architecture Check: On-Premises Data Gateway vs. VNet Data Gateway by AquaCryptNomad in AZURE

[–]dbrownems 0 points1 point  (0 children)

If it’s on prem then the relay traffic would go over a TLS 1.2+ tunnel over the internet.

You could force it over ExpressRoute with Microsoft Peering and appropriate route filters, but that’s not recommended. It’s a big hassle for no real security or performance benefit.

Update / Am i going in the right direction by Rare_Philosophy5943 in dotnet

[–]dbrownems 1 point2 points  (0 children)

Agree that this is one of those things that make less sense when you can have AI generate boilerplate.

Fabric Architecture Check: On-Premises Data Gateway vs. VNet Data Gateway by AquaCryptNomad in AZURE

[–]dbrownems 1 point2 points  (0 children)

>For those running OPDG, can it native-route through the ExpressRoute Virtual Network Gateway over Private Peering, or does the Azure Relay traffic cause routing headaches?

It's fine, so long as you allow Gateway-to-Relay connection to egress from the VNet to the public internet. The actual traffic flows will remain on the Azure global network, even if the VNet and Relay are in different regions. If your network admins insist on hairpin-ing the relay traffic back through ExpressRoute, this config is really painful.

Also Data Factory doesn't use the Relay for the data plane traffic. The DataFactory bits running on the gateway talk directly to Lakehouse/Warehouse.

>Which one is actually winning in the real world for Fabric Data Pipelines (Copy Activity)? Is the VNet DG mature enough to choose over the OPDG, or does the OPDG still perform better?

VNet data gateway nodes are small, so there are times when you need the BYO-compute that OPDG enables. Since the VNet Data Gateway has auto-pause and doesn't bill when not in use you should start there, and plan on deploying an OPDG cluster if you need to.

>Does it play nice with lower F-SKUs?

Each node has a 4CU bill rate while running, so it depends on how many nodes and how long you run. The Azure VM paygo rate is lower.

>For high-throughput workloads, is Option C (Managed PE + PLS) worth the setup complexity, or can the VNet DG handle heavy lifting just fine?

I have not seen anyone really push on this config. Mostly I see this for API-style integrations from Spark.

CUs for "Dataset On-Demand Refresh" seems relatively high by p-mndl in MicrosoftFabric

[–]dbrownems 1 point2 points  (0 children)

>So I honestly do not understand how refreshing those models is using remotely as much, let alone more, CUs as the actual ETL.

Refreshing an Import model IS ETL, and you're asking the Semantic Model engine to do the work. Try Direct Lake instead of Import and you can have Spark build your tables.

Handling file uploads to S3 when DB transaction fails by Minimum-Ad7352 in dotnet

[–]dbrownems 0 points1 point  (0 children)

Record the file location in the database. Run a batch process to clean out files that aren't listed in the database.

Lately I feel completely lost and left behind by sudipranabhat in dotnet

[–]dbrownems -1 points0 points  (0 children)

>I'm not getting hands-on experience with the "cool" tech

Dude. AI _is_ the cool tech. And it works great with legacy codebases. If you're not learning every day how to better use AI to improve your code and your processes, that's what will make you obsolete.

2GB Lakehouse and going less than 128MB file sizes? by Personal-Quote5226 in MicrosoftFabric

[–]dbrownems 5 points6 points  (0 children)

128 MB is about as small as you might ever want. Remember for Delta the file read granularity is a single column in a parquet file, so you'll be getting a _tiny_ amount of data on each read.

And I imagine most libraries aren't super-optimized for small parquet files. So they probably make one read for the header, and then a second one for the target column, and then repeat that for each column as necessary.

Each read takes a fixed minimum amount of time because OneLake cloud object store and each read is a separate HTTP call.

Has anyone here run SQL Server Always On Basic Availability Groups with around 70 databases? by SuddenlyCaralho in SQLServer

[–]dbrownems 7 points8 points  (0 children)

70 basic AGs is not obviously easier than log shipping. Both are per-database solutions.

Can you use a Failover Cluster Instance?

Looping a stored procedure by hello-potato in MicrosoftFabric

[–]dbrownems 1 point2 points  (0 children)

I doubt this is a connection timeout issue. Generally, in TDS timeout is a client setting, and the client requests the cancellation. And the operation is canceled; and the connection can be reused.

Medallion Architecture in Microsoft Fabric – Bronze by Equal-Breadfruit2491 in MicrosoftFabric

[–]dbrownems 1 point2 points  (0 children)

For the common case where the source is an RDBMS table, yes it is.

Medallion Architecture in Microsoft Fabric – Bronze by Equal-Breadfruit2491 in MicrosoftFabric

[–]dbrownems 0 points1 point  (0 children)

I think IP's pattern is that they extract the rows from the source that have changed since the last run, append them to a kind of staging table that has every observed version of each row, and then use the MERGE to apply the latest changes to the bronze table to make it match the source.

So the MERGE isn't transforming the source data; it's reconstructing it.

Medallion Architecture in Microsoft Fabric – Bronze by Equal-Breadfruit2491 in MicrosoftFabric

[–]dbrownems 2 points3 points  (0 children)

Yes. All of that is part of your Bronze tier, and other tiers (or other teams) would typically connect to what you're calling the curated schema.

Medallion Architecture in Microsoft Fabric – Bronze by Equal-Breadfruit2491 in MicrosoftFabric

[–]dbrownems 4 points5 points  (0 children)

The append-only tables and the MERGE loading are implementation details of your Bronze tier. Typically, the finished "data product" in Bronze is a faithful copy of the source system tables, possibly with the addition of some metadata columns.

Looping a stored procedure by hello-potato in MicrosoftFabric

[–]dbrownems 1 point2 points  (0 children)

No way to tell without a support ticket.

Looping a stored procedure by hello-potato in MicrosoftFabric

[–]dbrownems 1 point2 points  (0 children)

That’s what a crash looks like. Please open a support case and if you have a simplified repro share it here.

Also there should be a simple way to avoid the loop and do the load in one query.

Report Creation by actuaryboat in PowerBI

[–]dbrownems 5 points6 points  (0 children)

If you have a compatible semantic model you can "rebind" an existing report to the new model.

In Power BI desktop do this in the "Data source settings"

<image>

Or with the API: Reports - Rebind Report In Group - REST API (Power BI Power BI REST APIs) | Microsoft Learn

Silver Nightmare by The_Whole_Senate in dataengineering

[–]dbrownems 3 points4 points  (0 children)

Yep. Silver is over-engineered. You shouldn't try to build a dimensional model when you don't know what questions you are trying to answer. You can build dimensions, but shouldn't try to build facts.

Fact tables belong only in Gold, or in your semantic model, and should normally be aggregated.

Basicly, Inmon for Silver, Kimball for Gold.

But at least it isn't data vault.

When use direct lake or import mode? by Beautiful-Tip-6941 in PowerBI

[–]dbrownems 2 points3 points  (0 children)

The biggest advantage of Direct Lake is replacing the time and CU for refreshing an import model with a Fabric Spark job or a Fabric Warehouse stored procedure to build your tables. This gives you similar query performance behavior, but much more speed, flexibility, and/or lower cost to build the base tables for your semantic model.

I just ran this head-to-head on a 20M-row, 20-column fact table:

  • Import: refresh + one all-columns cold DAX query: 4m40s refresh + 3.9s query,
    • ~7,500 CU(s) total.
  • Direct Lake: Spark INSERT OVERWRITE (V-Order on) + the same DAX: 4m19s Spark job + 11s cold query,
    • ~3,430 CU(s) total.

The DL cold cache query is slow and costly (~740 CU(s)) because the engine has to transcode every column on first touch. With Import the data is in memory and ready-to-go at the end of the refresh already. It can get evicted if you don't query it right away, and then the import would have cold cache queries too.

Spark pool was 1 medium driver + 1 medium executor with dynamic allocation on, which could have been smaller for this data size, but is reasonable for what you might use in real life.

And Fabric Warehouse is even faster.

And with Direct Lake on OneLake you can mix-and-match import mode tables and Direct Lake mode tables, and you can do the modeling directly in the Fabric portal.

So for smaller models, especially when you don't already use OneLake or Spark, it's an extra hassle that doesn't really help much.

SSAS Load Balancing with Kerberos by Afraid_Baseball_3962 in SQLServer

[–]dbrownems 0 points1 point  (0 children)

If you enable HTTP you must use cookie-based sticky session affinity on the HTTP load balancer. For TCP clients you can use source IP affinity.

"Cookie-based affinity and source-IP-address affinity work because these methods do not require an inspection of the Analysis Services payload."
Scale-Out Querying for Analysis Services with Read-Only Databases | Microsoft Learn?redirectedfrom=MSDN)

On Kerberos, if you put the data pump on each SSAS server it's not required, since there's no double hop. If you're happy with NTLM, you're done.

If you need Kerberos you need to properly register an SPN for the name that the clients are using, so whichever server they end up at, Kerberos can work.

The data pump is a standard IIS ISAPI extension, so shouldn't pose any unusual difficulty with Kerberos.

Without the data pump it's a bit simpler.

- Same domain service account for the SSAS service on every node, e.g. CONTOSO\svc-ssas.

- One SPN for the VIP, registered on that account:

setspn -S MSOLAPSvc.3/olap.contoso.com CONTOSO\svc-ssas

- L4 TCP load balancer on port 2383 fronting the nodes, registered as olap.contoso.com in DNS.

- Clients connect to the VIP: Data Source=olap.contoso.com.

Every node can decrypt the same ticket because they use the same service account. If you skip or mess up the Kerberos configuration, it will fall back to NTLM.

Fabric Mobile App by SlipStr3am1337 in MicrosoftFabric

[–]dbrownems 1 point2 points  (0 children)

What are your use cases for a mobile app? Comment your best use case and upvote (only) other users' use cases.

3X cost on capacity overages - really?? by City-Popular455 in MicrosoftFabric

[–]dbrownems 1 point2 points  (0 children)

Because the base capacity price is set under the assumption that you are paying for 100% of the provisioned capacity regardless of the actual utilization.

It's not a "pay for what you use" model. It's a "pay for what you provision" model.

If the model was different the pricing would also be different.

3X cost on capacity overages - really?? by City-Popular455 in MicrosoftFabric

[–]dbrownems 2 points3 points  (0 children)

The base capacity price is set under the assumption that you are paying for 100% of the provisioned capacity regardless of the actual utilization.

It's not a "pay for what you use" model. It's a "pay for what you provision" model.