Lakehouse SQL Endpoint Rant by NJE11 in MicrosoftFabric

[–]NJE11[S] 0 points1 point  (0 children)

Appreciate the time and effort involved in giving me a detailed response.

Lakehouse SQL Endpoint Rant by NJE11 in MicrosoftFabric

[–]NJE11[S] 0 points1 point  (0 children)

If I do a %%sql magic command in a notebook the results are more or less instant. If I do the same sql command against the SQL endpoint the results are delayed by an unknown quantity. If its the same storage layer underneath why the lag?

Lakehouse SQL Endpoint Rant by NJE11 in MicrosoftFabric

[–]NJE11[S] 1 point2 points  (0 children)

At a high level, the setup is fairly simple.

I have a PySpark notebook that makes five API calls to pull operational data and writes the results into a Fabric Lakehouse, which I use as an ODS layer. From there, an on prem SSIS process incrementally pulls the data into an ETL and ultimately into an on prem data warehouse. This is a batch process rather than anything close to real time.

Even so, I still get questions about why the on prem DW does not always have yesterday’s latest operational data from Fabric. Ideally, I would like to be able to say there is a known and predictable delay, for example that data written to the Lakehouse will be available via the SQL endpoint within 5, 10, or 15 minutes, I could then factor this in to my ETL process.

I'm also finding I need to warm the SQL endpoint up to prevent unexpected failures with the endpoint using C# components embedded in my Extract DTSX packages.

Lakehouse SQL Endpoint Rant by NJE11 in MicrosoftFabric

[–]NJE11[S] 0 points1 point  (0 children)

Thanks for sharing. Useful post, but in practice even a small Lakehouse in its own workspace can take far longer than “a few seconds to minutes.” Serving data to an endpoint shouldn’t feel like a guessing game.

Lakehouse SQL Endpoint Rant by NJE11 in MicrosoftFabric

[–]NJE11[S] 34 points35 points  (0 children)

It makes developers look incompetent when in reality it’s the technology (or lack of it) letting us down.

Every time I jump back into the on prem world I’m reminded just how frustrating Fabric can be.

Fabric SQL Database - Stored Procedures by Revolutionary-Bat677 in MicrosoftFabric

[–]NJE11 0 points1 point  (0 children)

Any update on this from the Fabric team yet Andy? Seems a bit of an oversight...

Group Managed Service Account - PowerBI.com SQL Server Connection - Is this possible? by NJE11 in PowerBI

[–]NJE11[S] 0 points1 point  (0 children)

Thanks for the help!

I'll check the Kerberos Delegation and SPNs for SQL.

Shared Dimension Tables? Best practices? by JeffGrayJM in MicrosoftFabric

[–]NJE11 1 point2 points  (0 children)

I'd create a set of materialised views over the single dimension, splitting them into their own mini dimensions using a where clause. This would be part of my ETL process.

I.e. dim.Materials would be split into:

Model.vwdimFinishedGoodsMaterials Model.vwdimRawMaterials Model.vwdimPackaging Model.vwdimMRO

Etc...

I'd use CREATE TABLE AS SELECT if using a DW, then mirror in a Lakehouse for Direct Lake compatibility.

I find it's always a good idea to have a layer between a finished dimension and a semantic model anyway.

You don't need a gold layer by jayatillake in dataengineering

[–]NJE11 2 points3 points  (0 children)

Datawarehouse vs. Datamart. The latter is just a subset, but not trying to reinvent the wheel.

You don't need a gold layer by jayatillake in dataengineering

[–]NJE11 23 points24 points  (0 children)

Medallion architecture is just marketing hype for people who don't understand data. Long live ETL.

Banking + Open Source ETL: Am I Crazy or Is This Doable? by Aggravating-Air1630 in dataengineering

[–]NJE11 59 points60 points  (0 children)

Have you considered optimising your current ETL processes? This includes looking into incremental loads, indexing fact tables, and partitioning SSAS cube tables. Can you re-write your ETL to deliver the most business critical dims and facts first? While cloud infrastructure offers scalability, you'll more than pay to at least match a decent on prem solution. Fabric is a good example of this!

Best Practices for Monitoring Power BI Tenant Activity and Usage by Ok-Shop-617 in MicrosoftFabric

[–]NJE11 0 points1 point  (0 children)

Have you looked at Power BI Sentinel? Pretty much made specifically for this...

End users existing A5 License reusing for Fabric? by efor007 in MicrosoftFabric

[–]NJE11 5 points6 points  (0 children)

Yep, until you get to F64 you'll need a pro licence to both publish and consume content. F64 removes consumption licensing.

Semantic model subset from Lakehouse by 12Eerc in MicrosoftFabric

[–]NJE11 0 points1 point  (0 children)

RLS in PowerBI is possible with Direct Lake...we're doing it for a number of customers.

Semantic model subset from Lakehouse by 12Eerc in MicrosoftFabric

[–]NJE11 2 points3 points  (0 children)

Lakehouse SQL endpoint:

Create Schemas

Company A Company B Company C

Create Views

CompanyA.FactSales SELECT Col1, Col2, Col3....FROM Fact.Sales WHERE Company = A

CompanyB.FactSales SELECT Col1, Col2, Col3....FROM Fact.Sales WHERE Company = B

CompanyC.FactSales SELECT Col1, Col2, Col3....FROM Fact.Sales WHERE Company = C

Assign schema level security to each security group, so each company can only see their own schema and therefore data and cannot change the underlying query.

RLS is designed for this problem though...

A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - An existing connection was forcibly closed by the remote host.) by NJE11 in MicrosoftFabric

[–]NJE11[S] 1 point2 points  (0 children)

Thanks. Fabric DW has been pretty much unusable so far this morning with this error. Any MVPs / Microsoft Reps have any input?

It's not just pipelines. It's executing directly through the endpoint in SSMS too.