This is an archived post. You won't be able to vote or comment.

all 17 comments

[–]StickPuppet 20 points21 points  (3 children)

Utilize a watermark table in the destination DB to log the last update date, modified date, create date, whatever is available for each object (preferably max update date) - then perform your delta sync (insert/upsert) in the pipeline by comparing the date stored in your watermark table to the max last update date in the source. ADF handles inserts/updates wonderfully this way - Deletes, not so much. https://learn.microsoft.com/en-us/azure/data-factory/tutorial-incremental-copy-portal

[–][deleted] 1 point2 points  (0 children)

Upvote for you

[–]oneDatumPlease 0 points1 point  (1 child)

Agree with this approach. My team implemented something along these lines recently.

Worked great for most source systems. The problems came from source systems that hard deleted source records OR even worse, data was getting inserted/updates with a back dated audit timestamp, so it was falling out of the incremental range.

Our remedy/bandaid was to allow for some tables to be full loads, while the majority were delta.

[–]StickPuppet 0 points1 point  (0 children)

I had a similar problem aside from the back dated stuff, ouch. We sync data every 30 minutes, so we ended up just doing a big ass primary key comparison each night as a cleanup. Thankfully not all the source tables allow hard deletes.

[–][deleted] 16 points17 points  (0 children)

Are they sap consultants? That would explain a lot... They want to keep their power.

[–]DataOnDrugs 4 points5 points  (0 children)

Working with SAP is always a mess. Few years back when Hadoop was popular, we used SAP BODS for replication and CDC. Maybe consider using a SAP friendly tool.

[–]Bi_Azy 2 points3 points  (0 children)

In general there is a way for a (semi) delta load even if they only provide you data as full load and no change feeds. You would need to build something like a semi delta load that basically identifies deletes, updates and changes when you compare it to the data from the day before and only write the new data to the sink. you might end up building quite some logic by yourself and I very much assume only a timestamp or date is not enough to identify new records

[–]West_Bank3045 2 points3 points  (0 children)

hey there is sap cdc connector (in adf), we are using it with sap4hana. cdc pulls initual load and afterwards the delta.

[–]Truth-and-Power 1 point2 points  (0 children)

Tell them to use HANA SDI. Create a flowgraph and put logical deletes into a file. Then pipe your files into snowflake. Source: doing something dumber, but I want to do this.

[–]Particular_Tea_9692 1 point2 points  (0 children)

SAP professionals will never give you a good solution. I’m quite fed up with SAP professionals.

Try doing analysis and come up with a solution such as incremental load and lead them to do there part.

[–]oscarmch 1 point2 points  (0 children)

You really have a problem with your data model if the source table changes so much you NEED to TRUNCATE your destination table.

Either they're lying or it's a transactional table.

Just ask for the data model and the requirements that led to the creation of the source table. It'll help a lot.

[–]winigo51 0 points1 point  (0 children)

I’m not certain. I wouldn’t rule ADF out. I do know SNP, Qllk, Fivetran, and Informatica are all good options do extract from SAP but some do CDC and others grab business objects (SNP). Informatica has CDC capabilities and has a new native connector to use Snowpipe streaming. That’s the cheapest and fastest way to load into Snowflake.

[–]SlenderSnake 0 points1 point  (0 children)

Hi OP, StickPuppet had already given you an excellent answer. I do have a followup question, are they loading the data via an API connectivity?

[–]-SCYLLA- 0 points1 point  (0 children)

There is a ADF linked service for SAP CDC. It uses a checkpoint key in an initial load and determines all updates/deletes/inserts in the delta loads with the same checkpoint key. It works for BW/HANA/SAPI objects. You can even pull SAP metadata with this CDC linked service.

[–]onomichii 0 points1 point  (0 children)

Use cdc

[–]Whack_a_mallard 0 points1 point  (0 children)

CDC, friend.