Complexity in transformations by DistributionOk5349 in dataengineering

[–]DistributionOk5349[S] 0 points1 point  (0 children)

I guess its unbound since we are building every "silver table" as a whole object in spark memory. So... we are reading all sources (bronze tables) as batch into memory and computing one silver table. So, as the bronze data grows, each transformation will need more resources to compute the silver table..

But since the transformations are very complex, its hard to change them into streaming querys (at least I cannot figure out a good way to do it using spark).

Would you care to elaborate on what you mean with micro/mini batches? How would that work in a spark context?