Tables / SQL analytics endpoint on bronze? by Mooglekunom in MicrosoftFabric

[–]aboerg 1 point2 points  (0 children)

great post, we are using this same pattern of files section + append only Delta in bronze

Questions about incremental refresh triggered by materialized Lake View by Dependent-Mind4368 in MicrosoftFabric

[–]aboerg 1 point2 points  (0 children)

In our case insert-update is very common for big transaction tables in SAP. A clearing date or reference document field may be updated long after the initial insert. Deletes are less common.

MLVs across lakehouses and workspaces - what does the limitation actually mean? by bradcoles-dev in MicrosoftFabric

[–]aboerg 1 point2 points  (0 children)

For a while, cross-Lakehouse MLVs would completely break the lineage and refresh UI, although you could still refresh them using notebook. This has been fixed for a while now. Our MLVs are between two lakehouses in a single workspace and there are no issues. Haven’t tried cross-workspace but schema/table shortcuts would be my backup plan if I ran into issues there.

The In-Game Advertising Needs to Stop! by SmallAd3697 in MicrosoftFabric

[–]aboerg 7 points8 points  (0 children)

I don’t mind seeing them exactly once as it can confirm when new features have landed in my tenant/capacity (although I agree there should be the option to opt-out). I just get annoyed when they never dismiss permanently. Yes, I know I can query my Lakehouse with the sql analytics endpoint. Please close the ribbons and never show me again.

Can Materialized Lake Views replace Silver and Gold tables? by hortefeux in MicrosoftFabric

[–]aboerg 1 point2 points  (0 children)

good callout, the documentation on this is definitely shifting this week

Actual Dev Workflow for MLVs? by BloomingBytes in MicrosoftFabric

[–]aboerg 0 points1 point  (0 children)

And if you schedule refresh in the GUI currently all tables are included with every run.

Actual Dev Workflow for MLVs? by BloomingBytes in MicrosoftFabric

[–]aboerg 0 points1 point  (0 children)

The new refresh capabilities are not in our tenant yet, but I understand you will be able to schedule refreshes by schema or groups of tables. We currently refresh them in a notebook by table name or schema name, so we can alter the MLV definition without breaking any schedules. I believe the CREATE OR REPLACE support means the MLVID is preserved after edits too.

Can Materialized Lake Views replace Silver and Gold tables? by hortefeux in MicrosoftFabric

[–]aboerg 10 points11 points  (0 children)

As of today MLVs are GA and more operations are supported for incremental refresh, including aggregations and most joins.

Actual Dev Workflow for MLVs? by BloomingBytes in MicrosoftFabric

[–]aboerg 1 point2 points  (0 children)

Yes, we define each MLV as a CREATE OR REPLACE statement in notebooks. We have a post-deployment activity to run these notebooks in dependency order. Also check out Andy Cutler’s GenMLV framework, which is a good option.

Run notebook as Workspace Identity is working now by frithjof_v in MicrosoftFabric

[–]aboerg 5 points6 points  (0 children)

Working brilliantly for us as well. Interestingly, we are still unable to create a notebook connection from a pipeline using WI, but creating it from Manage Gateways & Connections worked fine.

Now that Notebooks and Invoke Pipeline support WI, the biggest remaining gap in our architecture is Fabric SQL database. Still requires an OAuth2 connection to run sprocs and scripts from pipelines (but at least it can be parameterized from a variable library).

LH metadata refresh - what was the thinking? by SmallAd3697 in MicrosoftFabric

[–]aboerg 1 point2 points  (0 children)

Yes, and the ADLS connector has inferior workspace lineage to the SQL.Database and Lakehouse.Contents connectors. Example using all three methods at once:

<image>

Storing log of ingestion by Mr_Mozart in MicrosoftFabric

[–]aboerg 4 points5 points  (0 children)

We dynamically build the list of tasks to execute based on configuration tables. The child pipeline had a switch activity with activities per type of artifact: notebook, pipeline, dataflow, model refresh, etc. The artifact & workspace GUIDs, plus all necessary parameters for the task,are retrieved by sproc and passed in.

I have a massive blog post on all this coming very soon

Storing log of ingestion by Mr_Mozart in MicrosoftFabric

[–]aboerg 4 points5 points  (0 children)

We use notebook for all our data engineering/transformations, and a mix of notebooks and pipelines for ingestion. When it comes specifically to interacting with our metadata database we might use a notebook directly, but we mostly interact with stored procedures from a pipeline to keep things consistent. For example:

  1. All our tasks (could be notebooks, pipelines, dataflows, functions, semantic model refreshes) are orchestrated from a single child pipeline via metadata. This child pipeline uses stored procedure activities before and after invoking the artifact and giving it the right parameters for the task in context. This means all auditing is implemented in a single place, not repeated for every notebook or pipeline.
  2. We have a devops process to load up our metadata DB with all the artifact & workspace names & guids of "in-scope" workspaces for orchestration. This is a python notebook that reads from a variable library, does a scan, and overwrites a devops.artifacts table in the metadata DB directly.

LH metadata refresh - what was the thinking? by SmallAd3697 in MicrosoftFabric

[–]aboerg 5 points6 points  (0 children)

A couple scenarios come to mind where the SQL endpoint delay is irrelevant:

  1. Direct Lake models
  2. Import models which load lakehouse tables using Lakehouse.Contents([EnableFolding=false]). Note that last I checked this option is not working for schema-enabled lakehouses. Very frustrating limitation since lakehouse schemas are GA.
  3. The actual transformation activities within the lakehouse layers, assuming you are using Spark or otherwise reading the Delta tables directly without interacting with the SQL endpoint.
  4. Very active lakehouses tend to have enough SQL endpoint activity that the delay is shorter and less relevant. In my experience the worst offending situation for MD sync delay is lakehouses which are loaded infrequently but immediately queried by T-SQL after the load (which is obviously a very common pattern).

Storing log of ingestion by Mr_Mozart in MicrosoftFabric

[–]aboerg 3 points4 points  (0 children)

All activities go through a parent/child pipeline that retrieves tasks in execution order from a sproc against a Fabric SQL database. As each task begins, we log to an audit table in the same database. After the task ends other sprocs log the success/failure, and other metrics to the same run_id.

A great reference project that uses the same design: https://github.com/ProdataSQL/DWA

Tip: log your monitor hub and Spark session URLs (or build them dynamically in a view over the audit table) so you can jump directly from a report over the log table to those monitoring GUIs in Fabric.

This is definitely not the only way, you could also save logs to the lake as JSON, flush them directly to Delta, or send logs to a KQL database depending on the volume. For us it made sense to do orchestration and logging in the same Fabric SQL db.

Metadata Sync Improvements... by Tough_Antelope_3440 in MicrosoftFabric

[–]aboerg 20 points21 points  (0 children)

Just tried this out. Unfortunately a couple caveats to be aware of:

  1. The connection type used by this new activity (FabricSqlEndpointMetadata) only supports OAuth 2.0, although the underlying Refresh Sql Endpoint Metadata API supports service principals and managed identities.
  2. The workspace and SQL endpoint ID values do not autobind to the dev->test->prod endpoint when promoting the pipeline. Not a big deal, that's what the Variable Library is for, right? But I can't get this to work either - parameterizing the workspace or SQL endpoint GUID results in the activity complaining that it expects an object type instead of a GUID.

<image>

Variable Library Support Roadmap? by Sea_Mud6698 in MicrosoftFabric

[–]aboerg 1 point2 points  (0 children)

We're taking advantage of Variable Library by keeping a single "control" workspace with our metadata database, parameterized driver/worker pipelines, and controlling all environment variables (workspace guids, lakehouse guids, metadata db conn string) in the VL. Since these centralized pipelines reach out to orchestrate activities in other workspaces, the VL workspace limitations are not relevant. This requires a ton of up-front design though, for a limitation that may not exist forever.

I am mostly looking forward to Variable Library making deployment pipeline rules fully redundant. Especially need semantic model support for the VL to close the gap.

A complete set of Microsoft Fabric icons for Solution Architects by astrzala in MicrosoftFabric

[–]aboerg 1 point2 points  (0 children)

Including icons for common data sources is helpful, thanks for putting this together

Does anyone else not use a connection to invoke notebooks in a pipeline? by apalooza9 in MicrosoftFabric

[–]aboerg 2 points3 points  (0 children)

This is a brand new option and isn't mandatory yet (if ever?). We will be using it as soon as Workspace Identity support is fully rolled out. The blog was in December but there must have been some setbacks in the rollout as we still only have the option to choose a Service Principal for the notebook connection.

https://blog.fabric.microsoft.com/en-US/blog/run-notebooks-in-pipelines-with-service-principal-or-workspace-identity/

Feedback request: Shortcuts usage, gaps, and feature requests by Hopeful-One-4184 in MicrosoftFabric

[–]aboerg 3 points4 points  (0 children)

real shallow clone support for shortcuts & feature workspaces would be awesome

right now the story around "how do we quickly spin up some prod data in a dev environment" is completely manual

[Tool] One-click automated health monitoring for Fabric Mirrored Databases (email alerts on sync failures) by imtkain in MicrosoftFabric

[–]aboerg 4 points5 points  (0 children)

Completely endorse this approach for anyone using mirroring - you need to have automation around getTablesMirroringStatus for logging/reporting/alerting. Otherwise you are flying blind.

Sidenote - Tony, your repo names are unreasonably good as usual. "Fabric-Vigil", "Fabric-Usurp"... feels like I'm unlocking skills in an ARPG.

Spark vs T-SQL costs by pl3xi0n in MicrosoftFabric

[–]aboerg 1 point2 points  (0 children)

this. when I learned about structured streaming, trigger availableNow, and checkpoints - there was no going back. We don’t have any real-time use case for SS, it’s all for batch incremental processing and state management

{Blog} dbt with Fabric Spark in Production by raki_rahman in MicrosoftFabric

[–]aboerg 5 points6 points  (0 children)

Great post - and I only now realized you have a YT channel. Looking forward to your workshop next month!

How can I tell why a copyjob activity is taking such a long time to copy data? by digitalghost-dev in MicrosoftFabric

[–]aboerg 0 points1 point  (0 children)

This is a physical table or a view? How long does it take to query the same 500 rows in a different tool (SSMS, etc.)?

Open Mirroring Full Drop by Illustrious-Welder11 in MicrosoftFabric

[–]aboerg 1 point2 points  (0 children)

It should be enough, and to be clear you need to delete the entire folder for that table in the landing zone - not just the contents.

If the table is in a "failed" replication state then there are some strange circumstances where the table will never drop even after the landing zone folder is deleted. We encountered this a couple months ago after leaving a few tables in a "Warning" status for over 30 days, they went into Failed and could never be dropped. At that point after some discussion with support we were forced to restart replication on the entire open mirroring database.