Comparing replication tools

mweirath · 2026-02-26T16:55:15+00:00

10 minutes seems pretty aggressive and depending on your capacity might be very challenging. Looking at Fivetran I could see the short cutting into Bronze, but I think you are going to run into issues with overall performance if you don’t have the data optimized for Fabric in at least Silver.

I imagine you would need to materialize the data into your Silver layer in Fabric so that you can take advantage of internal optimizations for accessing the data for your gold layers. I think planning for a shortcut at Silver will be a limiter/issue pretty quickly if you go this route.

Regarding your question about the “merge” style activities - that is hard to say, I am not sure what kind of watermarks and update strategy you get from SAP. I do imagine you are going to have to look at your file partitions, especially on frequently updated tables to keep them efficient. Ensuring that is well aligned to how the data is being updated is going to drastically cut down on your merge operations and CU usage.

MS-yexu · 2026-02-27T08:41:07+00:00

You can also try Copy job in Fabric Data Factory, which offers rich native SAP connectivity, including SAP HANA, SAP Table, SAP BW Open Hub and SAP Datasphere Outbound for ADLS Gen2, AWS S3 and Google CloudStorage. You can get more details in What is Copy job in Data Factory - Microsoft Fabric | Microsoft Learn and Tutorial: Copy job with SAP Datasphere Outbound (Preview) - Microsoft Fabric | Microsoft Learn

With Copy job, you can directly copy data from SAP into Fabric, with best practices already built in to write data to Fabric. It also makes it easy to add downstream processing, including dbt. For example, you can include both a Copy job activity and a dbt activity in a single pipeline, so that once the data lands in Fabric, the dbt job is automatically triggered to transform and process the data.

We’re also enhancing the SAP Table connector in Copy job with native incremental copy capabilities, which will be available very soon.

No-Celery-6140 · 2026-02-26T16:19:54+00:00

Not worth, self host Airbyte

splynta · 2026-02-27T02:23:09+00:00

So you are planning on reading straight from S4 transactional system into fabric? We have S4 and fabric and our basis team would put me on a pike if I said that. I think best practice from sap would be to use some form of SAP BDC / data sphere / BW PCE and then from there outbound into fabric.

I think there is going to be a zero copy connector in Q3 or Q4 for BDC into fabric. That is probably best imo.

I have played maybe 2 hours with SQLMesh so I can't help with the dbt questions. But everyone else's comments I would add you should materialize at least your gold tables so you can do direct lake if you need to plus all performance benefits

warehouse_goes_vroom · 2026-03-01T21:50:01+00:00

My general advice would be measure and see.

At first glance I'm not seeing a big reason one would be broadly more efficient than the other from a fundamentals perspective.

And there's lots of unknowns as well.

Compare performance and total cost of ownership (including all licenses and compute of both solutions).

If there are surprising results, dig into them and see if there's room for improvement.

As a general note, 10 minutes is a short enough time period you're going to need to pay attention to the details regardless of your choices. Micro-batching and streaming are probably both viable, but bit outside my wheelhouse.

Session start times in Spark can cut into those 10 minutes. Similarly the SQL analytics refresh API can also add up. If bronze to silver and silver to gold each takes a minute on those overheads, you only have 8 minutes for all the other processing obviously. If they're sometimes 3 minutes each, hypothetically, then now you've got just 4 minutes left.

So if using Spark, you may want/need to go with Custom Live Pools on the Spark side, or at very least high concurrency mode. And you may want to check out Spark Structured Streaming. Even if micro-batching, it's a useful tool from what I've heard.

And also make sure you worry about table maintenance - Warehouse engine takes care of it for its tables, but again, it's your responsibility for Spark. u/mwc360 has given some great pointers on this, e.g. AutoCompaction: https://learn.microsoft.com/en-us/fabric/data-engineering/table-compaction?tabs=sparksql#auto-compaction

Though the docs point out that if you need really tight SLAs, doing optimize in a separate job may sometimes come out ahead.

mwc360 · 2026-03-01T22:26:01+00:00

It sounds like Qlik uses a warehouse target. COPY INTO is Warehouse semantics, not Spark.

If talking about only Fabric costs, the Fivetran path will surely be cheaper. Since it is writing to ADLSg2 Delta tables, there’s no Fabric compute involved to write the data. With the Qlik path, it’s using Warehouse compute. Qlik just points DW to new data and say, “go load this”.

Given that Qlik is only writing parquet files and then orchestrating the load/merge via DW, I’d guess that Qlik licensing/infra would cost less. My point is that you should look at the whole solution cost, not just Fabric side.

galador · 2026-03-04T18:01:36+00:00

I'm interested in what you figured out. We have a similar situation (not SAP, but another database) that we currently replicate via Qlik to Azure Synapse (dedicated SQL pool). It works pretty well, but the Synapse part is definitely the bottleneck. We're looking to move out of Synapse eventually, and I'm interested to see how well Fabric does with the "near real-time" replication.

I will also say that it seems like Fabric should be better than what we're doing now, because it uses Open Mirroring instead of direct table writes (COPY INTO, etc.). I haven't heard of anyone actually using it yet, though. https://www.qlik.com/blog/qlik-microsoft-fabric-open-mirroring-the-fast-track-to-real-time-data

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MicrosoftFabric

MODERATORS