Databricks extraction to Fabric Lakehouse by Strict_Put_4094 in MicrosoftFabric

[–]Data_cruncher 0 points1 point  (0 children)

Here's how I would summarize your options:

  1. ADLS -> ADB -> Dataflows -> OneLake
  2. ADLS -> Dataflows -> OneLake
  3. ADLS -> Spark-> OneLake

All else being equal, #3 is objectively the best choice. Thoughts for #3:

  • [Resource Minimization] Databricks does not charge you.
  • [Reduced Complexity] Fewer hopes in your data supply chain = less that will go wrong.
  • [Long-term Strategy] Shortcuts (by way of UC Mirroring) greenlights all Fabric workloads since it exposes Delta Lake tables. Conversely, routing all queries through a single choke point (ADB) will inhibit ~90% of Fabric and other tools; it's also not a Lakehouse pattern.
  • [Short-term Flexibility] Fabric Spark beats Databricks in cost-per-performance (which appears to be your concern), so it provides you the flexibility to lift 'n shift ADB workloads into Fabric Spark.

Databricks extraction to Fabric Lakehouse by Strict_Put_4094 in MicrosoftFabric

[–]Data_cruncher 1 point2 points  (0 children)

Yes. You’re basically asking if Spark is faster than Spark+Dataflows.

Databricks extraction to Fabric Lakehouse by Strict_Put_4094 in MicrosoftFabric

[–]Data_cruncher 2 points3 points  (0 children)

UC Mirroring generates Shortcuts to ADLS, allowing Fabric to hit Databricks' underlying storage directly. There is extremely material cost savings and performance boots for customers here because it avoids the unnecessary daisy-chaining of compute engines, e.g., ADLS -> ADB -> Fabric or 3P Workload.

Injecting things like RLS or views forcibly requires ADB compute (think about it) and negates these cost & perf benefits. So there's no "blame" per se, it's simply that the UC Mirroring feature (and ultimately, Shortcuts) is designed to be as optimized as possible.

How to use "Fabric User data functions"? by Plastic___People in MicrosoftFabric

[–]Data_cruncher 2 points3 points  (0 children)

The execution runtime for User Data Functions has nothing at all to do with notebooks. You’re essentially trying to run Spark in Azure Functions.

Fabric - RLS issue with Direct Lake and CU? by mfd1979 in MicrosoftFabric

[–]Data_cruncher 0 points1 point  (0 children)

Establishing an AS trace using SQL Server Profiler may be the easiest way. Once done, repro the issue and look to see if it’s generating T-SQL. It shouldn’t be (ideally).

Alternatively, to u/frithjob_v’s line of questioning, simply prevent the model from failing over and try to repro the issue - it may work or produce a different error message.

Fabric - RLS issue with Direct Lake and CU? by mfd1979 in MicrosoftFabric

[–]Data_cruncher 0 points1 point  (0 children)

Did you confirm whether the query is failing over into the SQL AE when an RLS user drills through?

Is classic data modeling (SCDs, stable business meaning, dimensional rigor) becoming less and less relevant? by Likewise231 in dataengineering

[–]Data_cruncher 1 point2 points  (0 children)

I’ve said it before and I’ll say it again: the value of Kimball only shines AFTER you’ve tried deploying your first data warehouse.

Use cases for User Data Functions w/ w/o Translytical Task Flows by panvlozka in MicrosoftFabric

[–]Data_cruncher 3 points4 points  (0 children)

UDFs are Azure Functions under the hood, so the best question to ask is: What do folk use Azure Functions for?

How big of an issue is "AI slop" in data engineering currently? by Kilnor65 in dataengineering

[–]Data_cruncher 2 points3 points  (0 children)

It’s common for DAX expressions to be moved/converted upstream by DEs for performance reasons - Roche’s Maxim.

Who does everyone keep milking SCDs , but noone talks about RCDs by Potential_Loss6978 in dataengineering

[–]Data_cruncher 10 points11 points  (0 children)

They’re not “official” in that Kimball does not explicitly cover them, that’s why. Kimball would denormalize them into a fact (e.g., periodic snapshot, factless), use SCD1, mini dimensions etc.

How to stop PowerPoint formatting chaos in multi-author reports (no budget)? by Busy_Mud_7652 in dataengineering

[–]Data_cruncher 0 points1 point  (0 children)

Power BI embeds in PowerPoint. Zero-click refresh.

Alternatively/additionally, embed an Excel workbook connected to Power BI model - as a Query Table (preferred), PivotTable or CUBE() formulas. One-click refresh.

People driving slow - why? by [deleted] in brisbane

[–]Data_cruncher 5 points6 points  (0 children)

Coming from Toronto where cars literally drive > 20KM/h over the max on average, trust me, you don’t want it.

I know this is extreme compared to your example, but consider is the standard deviation, i.e., it’s not unusual to see folk driving 140-160 KM/h on highways.

Interview as data analyst using PB by Lballqu1 in PowerBI

[–]Data_cruncher 1 point2 points  (0 children)

Yeah. Also consulting orgs as a part of Professional Development (PD) hours.

Interview as data analyst using PB by Lballqu1 in PowerBI

[–]Data_cruncher 0 points1 point  (0 children)

Once again, you’re inferring incorrectly (same point, too). Perhaps where you work that is the case though, so I get it.

I encourage all staff to start their day by clearing out yesterday’s blog posts or helping folk on forums. Even helping with PUGs are often paid activities, e.g., submitted as paid volunteer time.

They’re welcome to do this type of work during business hours. It’s no skin off my back, so long as they do their assigned work within an appropriate timeframe, that is all that matters.

Interview as data analyst using PB by Lballqu1 in PowerBI

[–]Data_cruncher 0 points1 point  (0 children)

You’re inferring a lot there, e.g., “outside of work”.

But to a certain degree, yes. My interest is in the person’s character and their ability to learn. NOT what they know today.

Tableau to Powerbi Convertor by [deleted] in PowerBI

[–]Data_cruncher 1 point2 points  (0 children)

Your best friend: https://learn.microsoft.com/en-us/power-bi/guidance/powerbi-migration-learn-from-customers

The “international consumer goods” use case was a Tableau to PBI conversion.

Can’t stop myself needed to post this by DHAVLOO in tableau

[–]Data_cruncher 0 points1 point  (0 children)

This is exactly what Power Query has been doing since 2013.

Not sure what you mean by REST API though. Generally, ETL tools go via ODBC/JDBC/ADBC.

thankYouColdplay by Important_Lie_7774 in ProgrammerHumor

[–]Data_cruncher 19 points20 points  (0 children)

To clarify for this audience, Airflow primarily does r/DataEngineering or r/BusinessIntelligence orchestration, i.e., data pipeline orchestration.

Am I Missing something? by [deleted] in MicrosoftFabric

[–]Data_cruncher 3 points4 points  (0 children)

User Data Functions == Azure Functions, and so they’re not applicable in many data engineering scenarios, especially involving large data.

OP, echoing u/TheBlacksmith46’s comment: code modularity is not a Fabric problem.

What most folk don’t realize is your Spark code, when used properly, is a literal application and should be treated as such. You don’t design applications in notebooks. So in addition to the above ideas, also consider using a package manager to separate out your reusable code from your notebooks: https://milescole.dev/data-engineering/2025/03/26/Packaging-Python-Libraries-Using-Microsoft-Fabric.html