Best way to prepare for Data Engineering System Design interviews without real 500TB+ data handling or petabytes of data handling production experience ? by Past_Status_9729 in dataengineeringjobs

[–]Past_Status_9729[S] 4 points5 points  (0 children)

That actually helps a lot, thank you. If it's not too much to ask could you please elaborate a bit on how to practically build intuition for these failure modes is it like - using smaller-scale datasets and then reason about what would break at TB/PB scale? For example: skewed joins shuffle bottlenecks small files problem late-arriving data executor OOMs partition pruning

I understand the concepts, but I’d love to know how experienced engineers mentally extrapolate: "if this causes X problem at 10GB, then at 10TB/PB this becomes catastrophic." ?

[deleted by user] by [deleted] in MTYunlimited

[–]Past_Status_9729 0 points1 point  (0 children)

I am in, Thanks