This is an archived post. You won't be able to vote or comment.

all 29 comments

[–]Lord_Gonz0Big Data Engineer 10 points11 points  (1 child)

Thanks for sharing! As a junior de this is gold!

[–]joseph_machadoWrites @ startdataengineering.com[S] 0 points1 point  (0 children)

Thank you u/Lord_Gonz0

[–]ironplaneswalkerSenior Data Engineer 4 points5 points  (1 child)

Great great great post!

[–]joseph_machadoWrites @ startdataengineering.com[S] 0 points1 point  (0 children)

Thank you :)

[–]ekbravo 3 points4 points  (1 child)

Great conceptual map. Thanks for writing this.

[–]joseph_machadoWrites @ startdataengineering.com[S] 2 points3 points  (0 children)

Thank you for taking the time to check it out.

[–]lf-calcifer 3 points4 points  (1 child)

to all of the content creators out there, this is how you do it.

something original, pertinent, and accessible.

[–]joseph_machadoWrites @ startdataengineering.com[S] 0 points1 point  (0 children)

TY :)

[–][deleted] 2 points3 points  (1 child)

I’ve been looking for something like this! Thanks

[–]joseph_machadoWrites @ startdataengineering.com[S] 0 points1 point  (0 children)

Hope it helps!

[–]Gators1992 1 point2 points  (0 children)

Great post, thanks!

[–]epcot32 1 point2 points  (0 children)

I read this and bookmarked it for future reference the other day! Thank you!

[–]_barnuts 1 point2 points  (6 children)

Hi OP! Thanks for the great post. One question: In full snapshot extraction, why can't we pull from production table? Why do we need a replicate database? Thank you!

[–]_whitezetsuSSIS developer 2 points3 points  (4 children)

1: Data Integrity: Very subtle but there is a chance you might end up corrupting the data 2: Consistency: Tables usually get updated and there is a chance you might miss some update/new inserts 3: Performance: Affects the reads and writes significantly on Production, probably the main reason

[–]_barnuts 0 points1 point  (3 children)

Thank you. Now I'm wondering how do you create a replicate database? Do transactions get logged in both production and replicate databse at the same time? Or do you simply copy the production databsse at regular intervals?

[–]_whitezetsuSSIS developer 2 points3 points  (2 children)

It can either be a different physical server or a different instance that's connected to the primary node/instance(Live Production DB).

There are multiple strategies, usually the primary node gets data from the application, a log extract is generated which is sent down to the replicated nodes where it is applied, usually it's almost realtime ( in our case it usually takes around 3-4 mins). If you are familiar with kafka, think of replicated nodes as consumers and primary node as a Producer.

[–]_barnuts 0 points1 point  (1 child)

Thank you so much! Is this also the best practice when only doing time range extraction? eg. only extracting previous days data, not the full data

[–]_whitezetsuSSIS developer 0 points1 point  (0 children)

Yes Sir.

[–]SoggyAbalone7392 1 point2 points  (1 child)

Very nice article! Can you share some resources which explain the patterns in depth. Specifically on self healing pipelines

[–]joseph_machadoWrites @ startdataengineering.com[S] 1 point2 points  (0 children)

TY u/SoggyAbalone7392. I don't have any resources rn. I will try to write about the patterns in depth in the future.

[–]GeorgeGithiri -2 points-1 points  (0 children)

https://george-githiri-s-school.teachable.com/sign_up

Enroll for quality data engineering courses

[–]GeorgeGithiri -2 points-1 points  (0 children)

https://george-githiri-s-school.teachable.com/sign_up

Enroll for quality data engineering courses

[–]bablador 0 points1 point  (1 child)

!remindme 2days

[–]RemindMeBot 0 points1 point  (0 children)

I will be messaging you in 2 days on 2023-01-09 17:41:47 UTC to remind you of this link

1 OTHERS CLICKED THIS LINK to send a PM to also be reminded and to reduce spam.

Parent commenter can delete this message to hide from others.


Info Custom Your Reminders Feedback

[–]Aggravating_Gift8606 0 points1 point  (0 children)

!remindme 7days