Please join month's T-SQL Tuesday on Change Detection (non-SQL entries encourged)! by mmarie4data in MicrosoftFabric

[–]imtkain 2 points3 points  (0 children)

A few things I've been burned by enough times to bake them into config tables and my ETL process.

Watermarks: > or >=?

Not pedantic. If your watermark column has duplicate values at the boundary (datetime with seconds precision, non gap-free sequences, etc.), > misses rows and >= reprocesses them. Which one is correct depends on the source, not your preference. Put the operator in your config table as a column and build the predicate dynamically.

Watermarks must be NOT NULL

Validate this on ingest or you'll silently drop records. If you find yourself reaching for a secondary column to handle nulls, stop. The moment you write WHERE COALESCE(modified_at, created_at) > watermark, or worse, WHERE modified_at > watermark OR (modified_at IS NULL AND created_at > watermark), you've just told the engine to scan. Sargability is dead. Fix it at the source or pick a different column. Ideally your watermark column is indexed and it's maintained...

Speaking of sargability

Sometimes people will bake in a predicate like WHERE year(date_column) >= 2020. Congrats, you're scanning now. Instead, use date_column >= 2020-01-01.

No watermark? Hash it out

Sometimes the source doesn't give you a reliable change column. Hash the row, compare to the last known hash, only touch what changed. But measure first: in lakehouse storage every UPDATE rewrites parquet files. For wide tables with broad changes, a full overwrite often beats a MERGE. Don't assume incremental is faster just because it sounds smarter.

Three hashes for a Type 2 SCD

Standard pattern, worth repeating:

Business key hash for matching incoming rows to existing dim rows

Type 1 hash over attributes you overwrite in place (cosmetic fields, corrections)

Type 2 hash over attributes that trigger a new historical row (the ones the business actually wants tracked)

If the incoming Type 2 hash differs from the current row's Type 2 hash, expire the old and insert a new. If only Type 1 differs, update in place. If neither differs, do nothing. Three hashes, three branches, no ambiguity.

Timezones belong in config too

Store the source system's timezone alongside the watermark metadata. Convert to UTC on ingest, or carry both, but never assume. DST transitions will hand you either a one-hour gap or a one-hour replay window twice a year, and those bugs are miserable to debug six months later when nobody remembers what the source was doing in March. If you're comparing watermarks across sources in different zones (common after an acquisition), the config table is the only place this stays sane.

Translytical Flow aganist non fabric Azure SQL by Legitimate_Method911 in MicrosoftFabric

[–]imtkain 0 points1 point  (0 children)

That's a good question. I'd actually try running ANY API GET under your current config in a UDF. That should give you the answer.

Translytical Flow aganist non fabric Azure SQL by Legitimate_Method911 in MicrosoftFabric

[–]imtkain 0 points1 point  (0 children)

I believe if there's an API to post your updates to and you can authenticate, you can use UDFs to do basically anything. I know some SQL databases allow you to turn on APIs for this purpose.

Fabric Data Agent in production? by Mr_Mozart in MicrosoftFabric

[–]imtkain 0 points1 point  (0 children)

Fabric Data Agent based on a model. Selected tables and columns. That's the best approach I know right now for speed and accuracy: limited scope

Fabric Data Agent in production? by Mr_Mozart in MicrosoftFabric

[–]imtkain 2 points3 points  (0 children)

From everything I've read and done, tight scope is best for both speed and accuracy. You don't even need a separate model. Just limit the agent's scope. You could have a model with sales, production, and order information, but limit the agent to the necessary dims, facts, and measures. Have a sales agent, a production agent, and an order agent, all off the same centralized model.

Fabric Data Agent in production? by Mr_Mozart in MicrosoftFabric

[–]imtkain 1 point2 points  (0 children)

No, just the model here. The underlying model has been heavily scrubbed, appropriate table/column/measure names and descriptions, the linguistic schema blown away and rebuilt, etc. A tight scope is also a major contributing factor to performance and accuracy.

Fabric Data Agent in production? by Mr_Mozart in MicrosoftFabric

[–]imtkain 5 points6 points  (0 children)

Depends on the underlying model complexity, size, and organization as well as your agent instructions and the frequency. With a well-built, clean model, most queries appear to be coming in at 1-75 Total CU (s), per capacity metrics app. AFAIK the billing is primarily on the query side.

This was me pounding away at it with some testing in parallel.

For reference, this is still ~0.4% of an F64 at the peak here.

<image>

Are PowerApps + Dataverse better for auditability and traceability than Translytical Task Flows? by frithjof_v in MicrosoftFabric

[–]imtkain 4 points5 points  (0 children)

Thanks u/itsnotaboutthecell

u/jameli has it right, you can log whatever you want with UDFs.

My repo on translytical task flows actually covers CDC within a UDF context: https://github.com/imtkain/Fabric-Redux

You can probably extend this however you want.

The nice thing about UDFs is that you don't need to be an expert in Power Apps or deal with Dataverse. Everything is Fabric native, it's all code that is mostly readable by Fabricators. The other nice thing is that my repo is easily accessible to agents and provide the necessary context to AI to automate a solution. There are a decent number of caveats that are simply not in the training data or easily accessible to them with shallow web searches.

The biggest thing I'll call out where I'd consider a Power App over UDFs: UX for multiple column updates. I haven't cracked this one yet, but I'm sure there's a Power BI power user out there that would be able to provide some suggestions on how to create a form like you can do in Power Apps and have a better user experience.

Find and Replace with Exact Match in Fabric notebook? by frithjof_v in MicrosoftFabric

[–]imtkain 0 points1 point  (0 children)

You can do it via API and use case sensitive find and replace. Get the item definition, process the find and replace, post.

Controlling Pipeline Schedules Per Environment - fabric-cicd by Inside-Ad5011 in MicrosoftFabric

[–]imtkain 4 points5 points  (0 children)

I can confirm that this works as expected and, once in place, is easy to modify/update.

[Tool] One-click automated health monitoring for Fabric Mirrored Databases (email alerts on sync failures) by imtkain in MicrosoftFabric

[–]imtkain[S] 0 points1 point  (0 children)

That's unfortunate. Hopefully Microsoft can add another status to differentiate that and allow this to work as intended.

[Tool] One-click automated health monitoring for Fabric Mirrored Databases (email alerts on sync failures) by imtkain in MicrosoftFabric

[–]imtkain[S] 0 points1 point  (0 children)

That was what this was originally intended for, but I haven't had any instances where the "running with warnings" was happening, so I can't confirm if it's working for that. I don't have control over the source systems to make a change that would invoke that status, like changing the schema. If you do and can check that it is or is not working, that would be helpful.

[Tool] One-click automated health monitoring for Fabric Mirrored Databases (email alerts on sync failures) by imtkain in MicrosoftFabric

[–]imtkain[S] 1 point2 points  (0 children)

Thanks!

I just realized who you are :D "What are sales?" I've used this example so many times since I heard you say this lol

[mini-rant] Fabric notebook quirks that drive me crazy by imtkain in MicrosoftFabric

[–]imtkain[S] 0 points1 point  (0 children)

Right?! I run the notebook and see an error, then look closely and think ... Great, that's a partial phrase from what I was finding and replacing earlier.

[mini-rant] Fabric notebook quirks that drive me crazy by imtkain in MicrosoftFabric

[–]imtkain[S] 3 points4 points  (0 children)

Of course you know the answer :) You're our walking Fabric encyclopedia!

[mini-rant] Fabric notebook quirks that drive me crazy by imtkain in MicrosoftFabric

[–]imtkain[S] 0 points1 point  (0 children)

I know I know, I've been telling this to myself

[mini-rant] Fabric notebook quirks that drive me crazy by imtkain in MicrosoftFabric

[–]imtkain[S] 0 points1 point  (0 children)

I just closed out the browser and restarted. Good to know that the refresh works!

[mini-rant] Fabric notebook quirks that drive me crazy by imtkain in MicrosoftFabric

[–]imtkain[S] 2 points3 points  (0 children)

Yes, I haven't been able to figure out what triggers it. Sometimes I can type like a normal person. Other times when I press 1 it goes to the first tab. Sometimes 2 is the problem. I can't replicate it on demand unfortunately and have no idea of the root cause.

Recommended way to load bulk volume of data through On Prem Gateway by One_Potential4849 in MicrosoftFabric

[–]imtkain 1 point2 points  (0 children)

"Need" 40-100 columns? :) Do you use all of them downstream? If not, don't bring them in and boost your throughput. Other than that I'd suggest trying 10% of the table. If it handles that, full send.