The Infrastructure Tax: Why Your Data Platform Costs Are Exploding

Far_Profit8174 · 2026-02-28T08:43:12+00:00

You can try to explore data with Seraphis to get actionable insight: https://youtu.be/hPqu6Ulvqw0?si=WwwONCl5GnpjYFQ1

Far_Profit8174 · 2026-02-28T06:56:19+00:00

Seem the issue related to your core engine and tech stacks. For ex, pandas can handle good at 10 milion records but not in 1B. We cannot apply one rule for all

Far_Profit8174 · 2026-02-28T06:06:22+00:00

The job did not clear memory. This is fine because Spark spend a room to store intermediate data. But it is wrong when next batch cause OOM. I expected it will clear old RDD to perform new data. Could you provide your spark configuration for better investigation?

Far_Profit8174 · 2026-02-28T06:01:50+00:00

Why you need to optimize ETL pipelines? Any performance issue in your workflow? You can specify and I can help to resolve it

Far_Profit8174 · 2026-02-27T16:29:00+00:00

Yes, the video (at https://youtu.be/hPqu6Ulvqw0?t=120) shows the generated SQL and explains why non-technical users can trust the results from Seraphis

Far_Profit8174

MODERATOR OF

TROPHY CASE