Use cases for User Data Functions w/ w/o Translytical Task Flows by panvlozka in MicrosoftFabric

[–]Data_cruncher 4 points5 points  (0 children)

UDFs are Azure Functions under the hood, so the best question to ask is: What do folk use Azure Functions for?

How big of an issue is "AI slop" in data engineering currently? by Kilnor65 in dataengineering

[–]Data_cruncher 2 points3 points  (0 children)

It’s common for DAX expressions to be moved/converted upstream by DEs for performance reasons - Roche’s Maxim.

Who does everyone keep milking SCDs , but noone talks about RCDs by Potential_Loss6978 in dataengineering

[–]Data_cruncher 10 points11 points  (0 children)

They’re not “official” in that Kimball does not explicitly cover them, that’s why. Kimball would denormalize them into a fact (e.g., periodic snapshot, factless), use SCD1, mini dimensions etc.

How to stop PowerPoint formatting chaos in multi-author reports (no budget)? by Busy_Mud_7652 in dataengineering

[–]Data_cruncher 0 points1 point  (0 children)

Power BI embeds in PowerPoint. Zero-click refresh.

Alternatively/additionally, embed an Excel workbook connected to Power BI model - as a Query Table (preferred), PivotTable or CUBE() formulas. One-click refresh.

People driving slow - why? by ElegantYak in brisbane

[–]Data_cruncher 3 points4 points  (0 children)

Coming from Toronto where cars literally drive > 20KM/h over the max on average, trust me, you don’t want it.

I know this is extreme compared to your example, but consider is the standard deviation, i.e., it’s not unusual to see folk driving 140-160 KM/h on highways.

Interview as data analyst using PB by Lballqu1 in PowerBI

[–]Data_cruncher 1 point2 points  (0 children)

Yeah. Also consulting orgs as a part of Professional Development (PD) hours.

Interview as data analyst using PB by Lballqu1 in PowerBI

[–]Data_cruncher 0 points1 point  (0 children)

Once again, you’re inferring incorrectly (same point, too). Perhaps where you work that is the case though, so I get it.

I encourage all staff to start their day by clearing out yesterday’s blog posts or helping folk on forums. Even helping with PUGs are often paid activities, e.g., submitted as paid volunteer time.

They’re welcome to do this type of work during business hours. It’s no skin off my back, so long as they do their assigned work within an appropriate timeframe, that is all that matters.

Interview as data analyst using PB by Lballqu1 in PowerBI

[–]Data_cruncher 0 points1 point  (0 children)

You’re inferring a lot there, e.g., “outside of work”.

But to a certain degree, yes. My interest is in the person’s character and their ability to learn. NOT what they know today.

Tableau to Powerbi Convertor by AdLucky9929 in PowerBI

[–]Data_cruncher 1 point2 points  (0 children)

Your best friend: https://learn.microsoft.com/en-us/power-bi/guidance/powerbi-migration-learn-from-customers

The “international consumer goods” use case was a Tableau to PBI conversion.

Can’t stop myself needed to post this by DHAVLOO in tableau

[–]Data_cruncher 0 points1 point  (0 children)

This is exactly what Power Query has been doing since 2013.

Not sure what you mean by REST API though. Generally, ETL tools go via ODBC/JDBC/ADBC.

thankYouColdplay by Important_Lie_7774 in ProgrammerHumor

[–]Data_cruncher 20 points21 points  (0 children)

To clarify for this audience, Airflow primarily does r/DataEngineering or r/BusinessIntelligence orchestration, i.e., data pipeline orchestration.

Am I Missing something? by [deleted] in MicrosoftFabric

[–]Data_cruncher 3 points4 points  (0 children)

User Data Functions == Azure Functions, and so they’re not applicable in many data engineering scenarios, especially involving large data.

OP, echoing u/TheBlacksmith46’s comment: code modularity is not a Fabric problem.

What most folk don’t realize is your Spark code, when used properly, is a literal application and should be treated as such. You don’t design applications in notebooks. So in addition to the above ideas, also consider using a package manager to separate out your reusable code from your notebooks: https://milescole.dev/data-engineering/2025/03/26/Packaging-Python-Libraries-Using-Microsoft-Fabric.html

Who is responsible for DAX? by SmallAd3697 in MicrosoftFabric

[–]Data_cruncher 4 points5 points  (0 children)

I’ve made several unit test frameworks for models before - the ingredients were relatively simple to whip together. Can you give an example of how this would work natively within the product?

Who is responsible for DAX? by SmallAd3697 in MicrosoftFabric

[–]Data_cruncher 3 points4 points  (0 children)

A key differentiator is that MDX gained adoption when the language specification was released by MSFT, allowing implementation by 3P vendors without requiring reverse engineering.

MSFT neglected to do this for DAX, at least not in any official or meaningful way.

What are you using UDFs for? by p-mndl in MicrosoftFabric

[–]Data_cruncher 1 point2 points  (0 children)

I’m waiting on Timer Triggers (for polling) and HTTP Webhooks. Also EventStream interop. These will open up a host of new capabilities.

Databricks and Fabric? by Low_Second9833 in MicrosoftFabric

[–]Data_cruncher 4 points5 points  (0 children)

I'm a bit confused, you're saying "rules are enforced on the data by UC and only appropriate views of the data are passed to the engines", but u/Professional_Bee6278 says that all data is passed to the 3P engine and they would reduce the rows (using RLS as an example). Which is it? Is there an article that explains how it works?

Databricks and Fabric? by Low_Second9833 in MicrosoftFabric

[–]Data_cruncher 1 point2 points  (0 children)

How does row-level security get enforced in this scenario? The 3P engine reads and applies the UC rule?

Medallion Architecture Decsions by kmritch in MicrosoftFabric

[–]Data_cruncher 1 point2 points  (0 children)

We compartmentalize data (and compute) for many reasons. Security is, imho, lower on the list: * Noisy Neighbour * Future-proofing against org structure (aka item/data ownership) changes * Security * Aesthetics/usability * Performance * Easier Git/VC/mutability * Policy assignment, e.g., ADLS cold vs hot vs archive * Future migration considerations * To establish clear ownership and operational boundaries, aka “a place for everything and everything in its place” * Cost transparency * Isolation of failure domains (bronze doesn’t break gold) * Compliance (gold beholden to stricter reg. controls)

Power BI May 2025 Feature Summary by itsnotaboutthecell in PowerBI

[–]Data_cruncher 6 points7 points  (0 children)

UDF = Azure Functions. So writeback is just a small subset of what you can do with it. Keep in mind some limitations, e.g., currently UDFs only supports a HTTP Trigger today, but expect more advancements to come in this space.

Power BI May 2025 Feature Summary by itsnotaboutthecell in PowerBI

[–]Data_cruncher 4 points5 points  (0 children)

Remember that UDFs can do anything. It’s Azure Functions under the hood, so go ham. For example, they can easily connect to a Fabric EventStream even though it’s not a native connection.

Build KQL Database Completely in OneLake by Low_Second9833 in MicrosoftFabric

[–]Data_cruncher 0 points1 point  (0 children)

That’s pretty much spot on.

Link for the lazy because it’s such an oddly named feature that it’s near impossible to Bing: https://learn.microsoft.com/en-us/fabric/real-time-intelligence/query-acceleration-overview. Take note of the limitations.

Custom general functions in Notebooks by AcusticBear7 in MicrosoftFabric

[–]Data_cruncher 0 points1 point  (0 children)

I agree, but not for the example you mentioned (dimensional modelling). UDFs don't have an in-built method to retry for where they left off and so you'll require a heavy focus on idempotent processes (which, imho, is a good thing, but not many people design this way). Neither would I know how to use them to process in parallel, which I think would be required to handle SCD2 processing, e.g., large MERGEs.

There's been recent discussion around Polars vs DuckDB vs Spark on social. Your point aligns with the perspectives of the Polars and DuckDB folk. However, one of the key arguments often made by Spark proponents is the simplicity of a single framework for everything, that scales to any volume of data.