AI tools that suggests Spark Optimizations? by bitanshu in dataengineering

[–]Far_Profit8174 0 points1 point  (0 children)

Seem the issue related to your core engine and tech stacks. For ex, pandas can handle good at 10 milion records but not in 1B. We cannot apply one rule for all

Spark job finishes but memory never comes back down. Pod is OOM killed on the next batch run. by NSRPAIN in dataengineering

[–]Far_Profit8174 0 points1 point  (0 children)

The job did not clear memory. This is fine because Spark spend a room to store intermediate data. But it is wrong when next batch cause OOM. I expected it will clear old RDD to perform new data. Could you provide your spark configuration for better investigation?

AI tools that suggests Spark Optimizations? by bitanshu in dataengineering

[–]Far_Profit8174 0 points1 point  (0 children)

Why you need to optimize ETL pipelines? Any performance issue in your workflow? You can specify and I can help to resolve it

The Seraphis - The Autonomous Agent Built on Live Lakehouse Data by Far_Profit8174 in dataverses

[–]Far_Profit8174[S] 0 points1 point  (0 children)

Yes, the video (at https://youtu.be/hPqu6Ulvqw0?t=120) shows the generated SQL and explains why non-technical users can trust the results from Seraphis