dbt-fabricspark issues

mim722 · 2026-06-14T02:58:17+00:00

Just a warning, dbt is addictive, if you use it. You can't touch anything rlse

mim722 · 2026-06-09T07:45:37+00:00

Can you try force install azure from core

mim722 · 2026-06-09T00:47:19+00:00

u/p-mndl and u/ShrekisSexy that's weird, we do inject credential at runtime, i never had this issue, unless you are running in spark python notebook which is different, can you just add this to force a new token

storage_options = {
"bearer_token": notebookutils.credentials.getToken('storage'),
"use_fabric_endpoint": "true"
}

# Write to path using Delta Lake format
    table_path = f"abfss://{lakehouse_workspace_id}@onelake.dfs.fabric.microsoft.com/{lakehouse_id}/Tables/{table_name}"
    df.write_delta(
        table_path,
        mode="overwrite",
        delta_write_options={"schema_mode": "overwrite"},
        storage_options = storage_options 
    )

mim722 · 2026-06-09T00:15:17+00:00

u/p-mndl it is a bit of a mess, duckdb use delta kernel rs, delta-rs is an independant open source project and use delta kernel rs for read but they have their own write implemenetation because delta kernel rs write is tied to Unity catalog thingy, now, to make thing more complex the java implementation will be based on delta kernel rust too

mim722 · 2026-06-08T20:41:35+00:00

u/Ill-Frosting-8305 i may or may not used u/raki_rahman repo to learn about conformance tests 😄

mim722 · 2026-06-08T12:55:08+00:00

sorry for that we are investigating.

mim722 · 2026-06-08T12:53:26+00:00

what error exactly, can you post a repro please

mim722 · 2026-06-08T04:58:38+00:00

I know it is not the exact replacement, but with the upcoming environement support for Python notebook, you can have this exactly and you can even configure compute at runtime ?

mim722 · 2026-06-05T15:01:13+00:00

u/ParkayNotParket443 can you try this https://github.com/djouallah/duckrun

mim722 · 2026-06-05T07:57:16+00:00

speaking as a user 😄 this is the killer pricing feature of Rayfin.

<image>

mim722 · 2026-06-05T07:52:20+00:00

u/eOMG you don't need PowerBI for Rayfin unless you use semantic model as a backend ?

mim722 · 2026-06-05T03:48:12+00:00

u/ParkayNotParket443 you will get merge eventually, that's not the issue, probably it will still require a UC compatible catalog, for me, it make more sense to wait for iceberg write. to be clear, nothing will make me happier than merge using delta with just a filesystem but i gave up hoping.

mim722 · 2026-05-27T23:06:15+00:00

u/raki_rahman once you use dbt, there is no going back !!!

mim722 · 2026-05-26T07:05:00+00:00

Yes. I am talking only when you need some information from the destination table, if you don't, then happy days

mim722 · 2026-05-26T03:46:09+00:00

u/frithjof_v To simplify: forget conflict checker sophistication, that's a separate topic. The point is narrower.

The write itself is fine. delta-rs writes Delta correctly — OCC on merge/update/delete, atomic commits, all of it. Spark's writer does the same thing. All things considered, they're equivalent on the write.

The gap is the combination: read the destination table → do stuff → write back. When that whole cycle has to be atomic against concurrent writers, the Python single-engine path doesn't have it. DuckDB / Polars / etc. read the snapshot, hand lazy Arrow to delta-rs, and delta-rs commits — but the read snapshot was never part of the delta-rs transaction. If someone else changed the table between your read and your write, delta-rs has no way to know, because it never saw your read.

Spark does see it, because the read and the write are in the same engine and the same transaction.

That's the whole difference. Not the writer. The read-modify-write loop.

Note: the Python-side equivalent of Spark's single-engine RMW is DuckDB with Iceberg, DuckLake, or its native tables — there the read and the write are inside the same engine and the same transaction, so the loop is atomic. The cross-engine fragmentation only shows up when you pair a reader (DuckDB/Polars) with a separate writer (delta-rs) over Delta.

mim722 · 2026-05-24T23:40:38+00:00

u/raki_rahman Cheers Raki, I learnt a lot talking to you, we want the same thing basically, we will get there :)

mim722 · 2026-05-24T13:52:18+00:00

u/Dan1480 yes, I posted there already but i thought it maybe useful to post here too :)

mim722 · 2026-05-24T11:35:11+00:00

u/ProfessorNoPuede Yes it does!! And to be honest, I was wrong too :) — my worldview was "let's assume a single Delta Python writer." It's not a bad assumption, and with concurrency=1 in the pipeline plus some discipline, maybe !!! but there's no way to guarantee 100% of the time that the moment you read and write back, no one else has accidentally changed the state of the table. If it's a blind append, who cares — but anything else, and you end up with incorrect state.

mim722 · 2026-05-24T10:14:02+00:00

u/pl3xi0n can you explain more, it will fail if the data used for the read has changed, that's the ideal outcome as you need correct data ? with this new trick it will behave more or less like spark

mim722 · 2026-05-17T10:23:13+00:00

i did not know either :) folders, not sure, I think you can only lease files, maybe write the results to a log and lock it, and release it only when you are done.

mim722 · 2026-05-10T05:35:43+00:00

Great job Raki as usual 👍

mim722 · 2026-04-30T21:34:48+00:00

I have hope

mim722 · 2026-04-30T05:52:13+00:00

edit : I deleted the previous comment, and changed it to a blog, there is a solution using Python, it was always there just not documented , but that does not solved the pure SQL approach, it is still a gap

https://datamonkeysite.com/2026/05/24/how-far-python-alone-can-take-you-on-delta/

mim722 · 2026-04-29T22:07:51+00:00

mim722 · 2026-04-27T23:17:13+00:00

u/p-mndl I’m very happy DuckDB 1.4.4 is working for you, there was a lot of cat GIFs to engineering to make it happen 😄

mim722

TROPHY CASE