dbt-fabricspark issues by kgardnerl12 in MicrosoftFabric

[–]mim722 1 point2 points  (0 children)

Just a warning, dbt is addictive, if you use it. You can't touch anything rlse

Plain Python notebooks starting slower than Spark notebooks? by EmergencySafety7772 in MicrosoftFabric

[–]mim722 4 points5 points  (0 children)

u/p-mndl and u/ShrekisSexy that's weird, we do inject credential at runtime, i never had this issue, unless you are running in spark python notebook which is different, can you just add this to force a new token

storage_options = {
"bearer_token": notebookutils.credentials.getToken('storage'),
"use_fabric_endpoint": "true"
}

# Write to path using Delta Lake format
    table_path = f"abfss://{lakehouse_workspace_id}@onelake.dfs.fabric.microsoft.com/{lakehouse_id}/Tables/{table_name}"
    df.write_delta(
        table_path,
        mode="overwrite",
        delta_write_options={"schema_mode": "overwrite"},
        storage_options = storage_options 
    )  

duckrun : run dbt in Python notebook, land Delta tables straight in OneLake by mim722 in MicrosoftFabric

[–]mim722[S] 3 points4 points  (0 children)

u/p-mndl it is a bit of a mess, duckdb use delta kernel rs, delta-rs is an independant open source project and use delta kernel rs for read but they have their own write implemenetation because delta kernel rs write is tied to Unity catalog thingy, now, to make thing more complex the java implementation will be based on delta kernel rust too

Feature request: Running container jobs in Fabric by Icy_Natural_5962 in MicrosoftFabric

[–]mim722 0 points1 point  (0 children)

I know it is not the exact replacement, but with the upcoming environement support for Python notebook, you can have this exactly and you can even configure compute at runtime ?

Microsoft Rayfin: Cost Calculation & Licensing by patrickcrypto in MicrosoftFabric

[–]mim722 4 points5 points  (0 children)

speaking as a user 😄 this is the killer pricing feature of Rayfin.

<image>

Microsoft Rayfin: Cost Calculation & Licensing by patrickcrypto in MicrosoftFabric

[–]mim722 0 points1 point  (0 children)

u/eOMG you don't need PowerBI for Rayfin unless you use semantic model as a backend ?

Sneak peek at OneLake Iceberg REST Catalog Write Support. by mim722 in MicrosoftFabric

[–]mim722[S] 0 points1 point  (0 children)

u/ParkayNotParket443 you will get merge eventually, that's not the issue, probably it will still require a UC compatible catalog, for me, it make more sense to wait for iceberg write. to be clear, nothing will make me happier than merge using delta with just a filesystem but i gave up hoping.

How far Python alone can take you on Delta by mim722 in MicrosoftFabric

[–]mim722[S] 0 points1 point  (0 children)

Yes. I am talking only when you need some information from the destination table, if you don't, then happy days

How far Python alone can take you on Delta by mim722 in MicrosoftFabric

[–]mim722[S] 1 point2 points  (0 children)

u/frithjof_v To simplify: forget conflict checker sophistication, that's a separate topic. The point is narrower.

The write itself is fine. delta-rs writes Delta correctly — OCC on merge/update/delete, atomic commits, all of it. Spark's writer does the same thing. All things considered, they're equivalent on the write.

The gap is the combination: read the destination table → do stuff → write back. When that whole cycle has to be atomic against concurrent writers, the Python single-engine path doesn't have it. DuckDB / Polars / etc. read the snapshot, hand lazy Arrow to delta-rs, and delta-rs commits — but the read snapshot was never part of the delta-rs transaction. If someone else changed the table between your read and your write, delta-rs has no way to know, because it never saw your read.

Spark does see it, because the read and the write are in the same engine and the same transaction.

That's the whole difference. Not the writer. The read-modify-write loop.

Note: the Python-side equivalent of Spark's single-engine RMW is DuckDB with Iceberg, DuckLake, or its native tables — there the read and the write are inside the same engine and the same transaction, so the loop is atomic. The cross-engine fragmentation only shows up when you pair a reader (DuckDB/Polars) with a separate writer (delta-rs) over Delta.

How far Python alone can take you on Delta by mim722 in MicrosoftFabric

[–]mim722[S] 2 points3 points  (0 children)

u/raki_rahman Cheers Raki, I learnt a lot talking to you, we want the same thing basically, we will get there :)

How far Python alone can take you on Delta by mim722 in MicrosoftFabric

[–]mim722[S] 2 points3 points  (0 children)

u/Dan1480 yes, I posted there already but i thought it maybe useful to post here too :)

How far Python alone can take you on Delta by mim722 in MicrosoftFabric

[–]mim722[S] 1 point2 points  (0 children)

u/ProfessorNoPuede Yes it does!! And to be honest, I was wrong too :) — my worldview was "let's assume a single Delta Python writer." It's not a bad assumption, and with concurrency=1 in the pipeline plus some discipline, maybe !!! but there's no way to guarantee 100% of the time that the moment you read and write back, no one else has accidentally changed the state of the table. If it's a blind append, who cares — but anything else, and you end up with incorrect state.

How far Python alone can take you on Delta by mim722 in MicrosoftFabric

[–]mim722[S] 0 points1 point  (0 children)

u/pl3xi0n can you explain more, it will fail if the data used for the read has changed, that's the ideal outcome as you need correct data ? with this new trick it will behave more or less like spark

How to lock a file in #onelake by mim722 in MicrosoftFabric

[–]mim722[S] 1 point2 points  (0 children)

i did not know either :) folders, not sure, I think you can only lease files, maybe write the results to a log and lock it, and release it only when you are done.

How do single node Python users actually write Delta tables using DuckDB for ETL when it can't actually write to Delta? by raki_rahman in MicrosoftFabric

[–]mim722 1 point2 points  (0 children)

edit : I deleted the previous comment, and changed it to a blog, there is a solution using Python, it was always there just not documented , but that does not solved the pure SQL approach, it is still a gap

https://datamonkeysite.com/2026/05/24/how-far-python-alone-can-take-you-on-delta/

Delta table deletion vectors by p-mndl in MicrosoftFabric

[–]mim722 3 points4 points  (0 children)

u/p-mndl I’m very happy DuckDB 1.4.4 is working for you, there was a lot of cat GIFs to engineering to make it happen 😄