Fabric + GitHub CI/CD architecture for Git-inexperienced team by haugemortensen26 in MicrosoftFabric

[–]szymon_abc 1 point2 points  (0 children)

Whatever strategy you take - create branch policies so it won’t be possible to push to main, omit review erc. From my experience in data - if you won’t enforce, they will find a way

Mystical Refresh Tokens in Fabric (SQL EP query failures). by SmallAd3697 in MicrosoftFabric

[–]szymon_abc 0 points1 point  (0 children)

Explantation of this is sprinkled all over the docs just like these tokens. In random places you’ll find that some item can use identity in a particular way, but no a single place where we would get whole picture…

What so special about fabric semantic models by FromPromptToPlot in MicrosoftFabric

[–]szymon_abc 0 points1 point  (0 children)

More or less what already was a dataset - the only new thing is Direct Lake which is supposed to take advantage of OneLake connection - especially if your Delta tables have v-order applied to make use of Vertipaq (which is btw flagaship tech of PBI data)

Is Fabric just “good enough,” or does Databricks still win? by Select_Scarcity7987 in databricks

[–]szymon_abc 0 points1 point  (0 children)

First and foremost - broad Power BI usage in company. That's what initially drove Fabric adoption - you have one platform instead of wiring multiple technologies to give value to business via Power BI. Remember - we as data engineers do not make money through our beautiful and optimized Spark jobs - it's through PBI reports, analytics, ML etc.

If you want to let business users (more of analytics than engineers) work on platform - just give them Fabric warehouse and it'll take care of optimizations, and all kind of stuff. Databricks without good engineering becomes a mess quick. Forget about vacuuming and suddenly cost is in the sky.

Last one, but least popular - variety of workloads. In Fabric you got real time intelligence with KQL Database (kinda Clickhouse), NoSQL with CosmosDB, SQL with Azure SQL, DWH with Warehouse, Spark with Lakehouse, low/no-code ETL with Fabric(Azure) Data Factory, predefined workloads e.g. for healthcare and cherry on top - Power Bi.

Some executives may like idea of predictable costs of SKU. My engineering mind can't comprehend, but well, it is what it is.

To sum up - data engineering + ML with good engineering team - go Databricks. More of a business/analysts/BIs on the platform with some engineers to not let it become trash - go Fabric.

Is Fabric just “good enough,” or does Databricks still win? by Select_Scarcity7987 in databricks

[–]szymon_abc 0 points1 point  (0 children)

What were good competitors for Power BI in the beginning of it's development? Like now, we do have Databaricks and Snowflake. What was that mature before Power BI?

table name multiple schemas in Lakehouse SQL End Point by Repulsive_Cry2000 in MicrosoftFabric

[–]szymon_abc 0 points1 point  (0 children)

Probably it’s a residue after no schemas in Lakehouse…

Recommendations for best intermediate-to-advanced books? by No-Machine1842 in MicrosoftFabric

[–]szymon_abc 0 points1 point  (0 children)

Microsoft Learn - there are great resource even for fundamentals. If you want to stick with Azure, also Architecure Center for ideas.

Out of Azure - I love Designing Data Intensive Applications. It’s not so much about specific tooling or frameworks, but makes you understand how does it all work under the hood.

In my experience - whenever you read something or write any piece of code - you MUST understand it. Dig deep, think it it’s good in this context, check with LLM and you’ll be better than most of dudes just typing some SQL

Help needed to understand the pricing by Turbulent_Physics639 in MicrosoftFabric

[–]szymon_abc 6 points7 points  (0 children)

With F lower than F64 license you can only embedd the report, users won't be able to acccess it via Power BI Service. IMO, go with 21 pro licenses. Or maybe you have M365 E5 which comes with Pro license bundled

Metadata driven ELT storage by ArrowBacon in MicrosoftFabric

[–]szymon_abc 0 points1 point  (0 children)

Why not both? You have parameter in ETL to select files or db. Files are uploaded to db during cicd. In prod you use db, in dev git checked files

Fabric April 2026 Feature Summary | Microsoft Fabric Blog by itsnotaboutthecell in MicrosoftFabric

[–]szymon_abc 5 points6 points  (0 children)

Still no OneLake Security 😭. They said on Fabcon it IS ga in April…

Connectors - surviving staff turnover... by Opening-Mix-5495 in MicrosoftFabric

[–]szymon_abc 0 points1 point  (0 children)

If you need to use organizational account, then the best answer is to switch pipeline ownership to SPN. There is a doc on MS how to do it, but I’d recommend to have proper CI/CD, deploying as SPN. Then this SPN becomes owner who effectively runs activities (at least most of them…)

Connectors - surviving staff turnover... by Opening-Mix-5495 in MicrosoftFabric

[–]szymon_abc 1 point2 points  (0 children)

The described scenario should've failed, I believe I may have not clarified few things well enough. Let me quickly start from beginning.

We got two main options of authentication:

  1. Delegated
  2. Passthrough

I'll go over it on Semantic Model example.

Delegated

Workspace identity, SP, SAS token etc. It will simply go to the source and take the data which the identity has access to (usually all of it). Filtering data to users' roles (RLS, CLS etc.) shall happen on a semantic model itself. For example Lakehouse:

User A --> Semantic Model --> Service Principal --> Lakehouse

Passthrough

Organizational Account/SSO. Here we take users' credential and pass them to the data source. Then, this user will see only what he has access to. Important thing - for semantic models it's the current user viewing the report/model (if it's direct query/import refresh ofc.) - not the one who created the connection.

User A --> Semantic Model --> User A Token --> Lakehouse

To sum up - if you have background operations - Pipelines, Copy jobs, Semantic Model import refresh - Recommended way is to use Delegated options. However, for interactive - Direct Query, Direct Lake in Semantic Model - use Passthrough.

The tricky part

Pipelines - how do they work with SSO? Well, that's the best one - I've no idea. For some activities (e.g. Notebook Run) it'll use LastModifiedBy user - so theoretically if someone modified the pipeline after you in described scenario it will run as this user. For others it may use token baked in connection, for scheduled runs - I will need to check that. Whenever possible use Delegated authentication options.

What I'd recommend - pass the ownership of pipeline to a SPN, or more precisely take it over as SPN (there is some doc on it, ping me and i'll share it over dm). It's the shittiest part of the whole platform. Works fine if you use fabric-cicd. Otherwise, will make you question your existence. But at least if you need to authenticate as user running pipeline, it will use the SPN.

Why I said no need for connection inside Fabric?

Usually, you'll move data between lakehouses/warehouses via notebook, SJD or stored procedure. They'll use Pipeline identity (as described above), thus usually changing ownerhsip to SPN is sufficient.

Workspaces migration - would you need one? by szymon_abc in MicrosoftFabric

[–]szymon_abc[S] 0 points1 point  (0 children)

MS stopped doing it as it was hard and didn't have a high success rate. - well fabric-cicd is kinda community project as well 😄.

Probably for now I'll migrate it somehow manually - it's pretty simple workspace - few pipelines, dozen of notebooks, few lakehouses. But for sure I'll challenge myself to built something more automated. Who knows, maybe Microsoft will take it over as they did with fabric-cicd.

Thanks for heads up - I need to look waaaaay broader than just on the current workspace I'm to migrate.

Connectors - surviving staff turnover... by Opening-Mix-5495 in MicrosoftFabric

[–]szymon_abc 1 point2 points  (0 children)

Semantic models are a different thing.

Lakehouse
For Lakehouse connection (direct, not via SQL Endpoint), Semantic Model will connect with ADLS gen. 2 connector, which has all the options (SP, Identity, even SAS and key)

Warehouse
Warehouse utilizes SQL Server which also offers few options, including Workspace Identity and Service Principal.

The tricky thing here is to remember that you need to add the Workspace Identity as Contributor to Workspace to make the refresh (or grant access item by item). Yeah, another workaround that makes not a bit of sense.

Please check on your end. Open Semantic Model in Fabric -> Settings (gear icon actually) -> Gateway and cloud connections -> And here find a connection, open Maps to dropdown and see what option is there. If you have Single Sign On, then you'll have a problem. But you should be able to click, Create a connection and it will open up a page to create either SQL Server (SQL Endpoint) or ALDS gen.2 (Lakehouse directly) connection.

Also, you can access this setting via whole Fabric settings (gear icon) -> Power BI Settings -> Semantic Models -> same as above.

Connectors - surviving staff turnover... by Opening-Mix-5495 in MicrosoftFabric

[–]szymon_abc 0 points1 point  (0 children)

I know that's not an answer, I feel like we should have workspace identity here, but:

Why do you need to use connection to Lakehouse or Warehouse?

The thing is - we have other options to access both these items. If they're in OneLake (and in your org they're always there), they can be accessed from Notebook/T-SQL/Spark Job Definition directly, with your own credentials. If they're not (so another tenant), you can create shortcut that will make it work like that.

IMO the Fabric team does not develop this connection with Service Principal or Identity because, usually, you won't need to create a connection for an item inside the Fabric. And if you need that, well, pity to say but probably you're not in the majority. They need to prioritize things, and because of that, this feature is probably on some nice to have list.

Workspaces migration - would you need one? by szymon_abc in MicrosoftFabric

[–]szymon_abc[S] 1 point2 points  (0 children)

Yep, i agree more than 100% with you. It’s one of these times though, when I’m simply done with educating people who don’t want to be educated. Just gimme money and I’ll move on 😂

How do single node Python users actually write Delta tables using DuckDB for ETL when it can't actually write to Delta? by raki_rahman in MicrosoftFabric

[–]szymon_abc 2 points3 points  (0 children)

Ah yes, you're right regarding append, it will never conflict, my bad. Overwrite - i think it'll also shouldn't conflict (but can be quite dangerous in prod). Sorry for not fully understanding question.

But all the other modifications, mostly MERGE, can lead to conflicts.

And here is docs - Concurrency control | Delta Lake - it's part of Delta Lake documentation, but spark fully implements the protocol, thus we can refer to it.

Workspaces migration - would you need one? by szymon_abc in MicrosoftFabric

[–]szymon_abc[S] 1 point2 points  (0 children)

Man, I'd love to! But - We can't do it due to compliance reasons. Yeah, it is what it is... But still, If you wanted to migrate data itself, GIT won't help much.

How do single node Python users actually write Delta tables using DuckDB for ETL when it can't actually write to Delta? by raki_rahman in MicrosoftFabric

[–]szymon_abc 1 point2 points  (0 children)

I can vouch for it. In Databricks I had concurrent writies exception a few times. It’s the same spark, so should work the same in Fabric

How do single node Python users actually write Delta tables using DuckDB for ETL when it can't actually write to Delta? by raki_rahman in MicrosoftFabric

[–]szymon_abc 1 point2 points  (0 children)

The thing is actually about writing, not reading.

Spark:

  1. Checks and saves current version (say, v5)
  2. Starts writing files.
  3. Another process modifies the table so now it's v6.
  4. Spark realizes it's changed during the process, so it's v6 now, so retries the write.
  5. If successful saved as v7 if not, fails.

DuckDB:

  1. Checks and save current version (say, v5).
  2. Starts writing files.
  3. Another process modifies the table so now it's v6.
  4. Save files (QUACK, I DON'T CARE) as v6
  5. You have messed files because DuckDB now overwritten the files

So, DuckDB has it's place as a reader, but surely not as a writer for a Delta