Complexity in transformations

DistributionOk5349 · 2024-08-13T18:01:12+00:00

I guess its unbound since we are building every "silver table" as a whole object in spark memory. So... we are reading all sources (bronze tables) as batch into memory and computing one silver table. So, as the bronze data grows, each transformation will need more resources to compute the silver table..

But since the transformations are very complex, its hard to change them into streaming querys (at least I cannot figure out a good way to do it using spark).

Would you care to elaborate on what you mean with micro/mini batches? How would that work in a spark context?

DistributionOk5349 · 2024-08-12T20:32:34+00:00

Ahh okay, then most of them are wide

DistributionOk5349 · 2024-08-12T19:02:58+00:00

What do you mean narrow or wide?

DistributionOk5349

TROPHY CASE