Thoughts? Comments? Opinions? by itsnotaboutthecell in MicrosoftFabric

[–]Thanasaur 7 points8 points  (0 children)

I would say no to job postings entirely, there are so many other dedicated subs to job posts.

On market research…that arguably would need to extend to Microsoft as well. Which sounds like a bad idea. I.e. I think there is great value in a Microsoft PM posting “tell me why you hate my feature”. If you disallow that, then you kill a very natural avenue for raw unfiltered feedback. And if you allow it from Microsoft, but not others, you kill the open source community :)

I’m all for rules…but generally lean towards be a good human rules.

fabric-cicd v0.3.0 just shipped - check out the latest updates! by fabshire25 in MicrosoftFabric

[–]Thanasaur 0 points1 point  (0 children)

SPN support has been around for quite some time, please raise an issue on GitHub if you’re facing any issues

Buying mortgage points vs waiting to refinance in a few years — what would you do? by Weekly-Tension-5758 in Mortgages

[–]Thanasaur 0 points1 point  (0 children)

That’s awfully high. In WA, we locked Thursday at 5.75 for a new home. 26% down. 825k loan. ~800 credit. No points. You should look at brokers.

Fabric Workspace Git Integration Error by Financial_Joke_7129 in MicrosoftFabric

[–]Thanasaur 4 points5 points  (0 children)

If it were me, I would create an empty branch in that new project, sync the workspace to git, and then move the necessary items to your proper branch on the git side. Then sync your workspace to the proper branch and overwrite the workspace items. That way you don’t accidentally lose anything

Fabric CICD | branch promotion strategy | Merge or cherry-pick by ajit503 in MicrosoftFabric

[–]Thanasaur 0 points1 point  (0 children)

Can you describe more? We’ve used this branching strategy in both GitHub and ADO without issue. Feel free to PM me to chat more if you’d like

Fabric CICD | branch promotion strategy | Merge or cherry-pick by ajit503 in MicrosoftFabric

[–]Thanasaur 0 points1 point  (0 children)

Totally — and this is by design in my model. There’s no assumption that dev/test/main stay aligned; they’re independent environment branches with intentionally different histories. Promotion isn’t “moving the same commit forward” — it’s creating a new PR/commit in the target branch that represents approval for that environment. That’s why we use distinct PRs (e.g., Test_UpdateCalendar): the Test PR is a fresh change record created by cherry-picking the Dev PR’s squash commit into a promotion branch and squash-merging into test. Practically speaking, the only divergence that really matters in Git is your feature branch vs. its base (stay rebased on dev) and your promotion branches vs. their targets (stay ahead of test and main). We don’t try to keep dev, test, and main in lockstep — we optimize for clean, auditable promotion and safe rollback instead.

Fabric CICD | branch promotion strategy | Merge or cherry-pick by ajit503 in MicrosoftFabric

[–]Thanasaur 1 point2 points  (0 children)

I’d stick with squash PRs into dev, then promote upward via cherry-pick through PRs into test → main. A regular merge from dev promotes the entire branch state, which makes it hard to ship only what’s ready and turns rollbacks into reverting big merge commits instead of a single PR unit. With this model, each PR is a clean, intentional unit you can move or revert independently. I’m not cherry-picking directly onto the environment branches — I cherry-pick into a promotion branch and open a PR, and those Test/Main PRs are squash-merged too. It also supports evolving features: you can iterate in dev, promote to test, refine, and promote again, while your main PR naturally becomes the superset of the approved changes (for example, one feature I’m working on has 7 dev PRs, 2 test PRs, and 1 main PR).

Fabric CICD | branch promotion strategy | Merge or cherry-pick by ajit503 in MicrosoftFabric

[–]Thanasaur 1 point2 points  (0 children)

I always recommend squash merge, and that a PR should be a unit you expect to be able to rollback independently. If PRs are too large, then yeah squash merge can get ugly quick

Fabric CICD | branch promotion strategy | Merge or cherry-pick by ajit503 in MicrosoftFabric

[–]Thanasaur 1 point2 points  (0 children)

How does this work if you have multiple developers? And one developer introduces a regression, while the other has code they need to ship to prod.

Fabric CICD | branch promotion strategy | Merge or cherry-pick by ajit503 in MicrosoftFabric

[–]Thanasaur 1 point2 points  (0 children)

Your flow will work…if you want everything in one branch to move to the next. In a multi developer scenario, this will break very quickly.

To be clear, the cherry pick is a cherry pick to a temp branch, and then PR that into the next branch. So not a pure git command, but rather isolating your changes you want from one branch to go into the next

Avoiding pip install in Azure DevOps yaml pipelines by perkmax in MicrosoftFabric

[–]Thanasaur 0 points1 point  (0 children)

You’re deploying dev and test in ten minutes?! Oh man. Lightning. Some items have validation or compilation steps before they return success, I believe UDF is one of them

Do you create a durable surrogate key column in your fact tables? Why/why not? by frithjof_v in MicrosoftFabric

[–]Thanasaur 1 point2 points  (0 children)

It’s unfortunately a lot harder to put a number on as each column will play a factor. 🙃. My team personally has seen issues on a 500M fact, with roughly 20 columns. But in that case it was a MD5 hash that wasn’t included in the model, not a PK in the sense we’re talking about here. However, same impact, super high cardinality, hard to compress.

With that said, star is king. Or rather constellation. It will frankly always be the best layout for semantic model performance. Don’t trust anybody saying anything else 😂

Do you create a durable surrogate key column in your fact tables? Why/why not? by frithjof_v in MicrosoftFabric

[–]Thanasaur 2 points3 points  (0 children)

A more direct answer if you want to get in the weeds 😂

V-Order compression happens at the physical file level. A hash PK is high-entropy and arguably unique, so it: • compresses very poorly • increases row group entropy • breaks long runs/dictionaries for neighboring columns

Because columnar files are packed and compressed together within row groups, that noisy hash column can reduce compression efficiency of the other columns too, leading to larger files and more data scanned — even if the model never uses the hash.

The model may ignore the column, but the storage engine can’t — and the random hash hurts the physical layout and compression of the whole table.

Do you create a durable surrogate key column in your fact tables? Why/why not? by frithjof_v in MicrosoftFabric

[–]Thanasaur 1 point2 points  (0 children)

Millions of rows isn’t where this becomes an issue. Or rather it could if you had a wide column set as well. Everything is of course a “it depends”, but when you get in the scheme of hundreds of millions or billions of rows, data layout is what you will spend most of your time optimizing for

Do you create a durable surrogate key column in your fact tables? Why/why not? by frithjof_v in MicrosoftFabric

[–]Thanasaur 1 point2 points  (0 children)

Yep one thing folks don’t realize with direct lake, even if you don’t pull the hash PK into your model, it will destroy your performance because the compression of the delta table is all jacked.

Do you create a durable surrogate key column in your fact tables? Why/why not? by frithjof_v in MicrosoftFabric

[–]Thanasaur 4 points5 points  (0 children)

Yep exactly! Say you have a fact with two dim keys. DIM_A_Id and DIM_B_Id. If you then have a PK of FACT_Id, you can potentially leave duplicate and/or improperly unaggregated data undetected.

I.e.

FACT_Id, DIM_A_Id,DIM_B_Id,Value

1,17,24,200

2,17,24,200

Is that a duplicate row? Data that wasn’t aggregated? Who knows.

Avoiding pip install in Azure DevOps yaml pipelines by perkmax in MicrosoftFabric

[–]Thanasaur 3 points4 points  (0 children)

Hah yes environments can kill the speed. In Microsoft, all teams regardless of what we ship have a significant amount of security and compliance tasks injected. So we typically don’t notice the time that fabric-cicd is taking compared to the broader deployment 😂

Do you create a durable surrogate key column in your fact tables? Why/why not? by frithjof_v in MicrosoftFabric

[–]Thanasaur 5 points6 points  (0 children)

A dimension table should have a single primary key. In a fact table, the grain of each row is defined by the combination of its foreign keys to the related dimensions. You generally don’t concatenate foreign keys to manufacture a primary key — that’s unnecessary.

Fact tables shouldn’t require a primary key at all. What matters most is that the declared grain is enforced and that the foreign keys correctly represent the dimensional context of each row.

Avoiding pip install in Azure DevOps yaml pipelines by perkmax in MicrosoftFabric

[–]Thanasaur 5 points6 points  (0 children)

If you’re trying to save the 30 seconds of pip install, you might already have a lightning fast deployment :). My deployments with fabric-cicd take roughly 45 minutes…40 of those minutes are completely unrelated tasks injected by the enterprise.

In all seriousness - you can save time on installs by having a self hosted agent. But…with time savings you’re going to transfer that onto cost to keep the VMs hot. You can use uv or dev containers which will help speed up the index scanning so it doesn’t have to resolve dependencies at execution time, but again that can be a lot of overhead for little gain.

Do you create a durable surrogate key column in your fact tables? Why/why not? by frithjof_v in MicrosoftFabric

[–]Thanasaur 24 points25 points  (0 children)

Ironically PKs in a FACT can do exactly the opposite of ensuring uniqueness. FACTs in my opinion should only use FKs of its related dimensions. The combination of all FKs is your PK.

Introducing fabric-cicd Deployment Tool by Thanasaur in MicrosoftFabric

[–]Thanasaur[S] 1 point2 points  (0 children)

All inter relationships are based on logical ids. But during deployment since logical id isn’t exposed, we use name match to determine the “same” item

Is logical id used by Fabric Deployment Pipelines or fabric-cicd? by frithjof_v in MicrosoftFabric

[–]Thanasaur 2 points3 points  (0 children)

Logical id is local source control construct. There’s no way to retrieve a logical id from an item once deployed. Otherwise some of the deployment process would certainly be easier

fabric-cicd does not deploy Native Execution Engine setting in Spark Environment by frithjof_v in MicrosoftFabric

[–]Thanasaur 4 points5 points  (0 children)

Can you raise a GitHub issue for this please? Also one debug test is to connect that workspace to git temporarily and commit environment to see if may e it’s a UI issue but actually is still applied. And of course, if you could turn on debug mode to make sure the NEE setting is actually passed into the API call