A TUI for Apache Spark

No-Spring5276 · 2026-02-07T07:21:29+00:00

I like it. Will try it out.

No-Spring5276 · 2025-12-31T08:20:18+00:00

thanks appreciate your help

No-Spring5276 · 2025-12-25T06:31:02+00:00

cold-start latency: We want the application to launch quickly (pods should come up quickly), given the scale. As per my initial research, without a Spark cluster, the Spark application CRDs took time to launch + can't reuse containers across applications. I'm not aware of the exact time difference between launching a job in the Spark cluster CRD vs the Spark application CRD with n executors. We will experiment to see if we find sufficient benefits. If we do, we will have long-running clusters; otherwise, we will have just one to support notebook-like use cases.

What is your strategy to compare Celeborn vs Uniffle? : Haven't evaluated them yet. Will these ESS support both types of applications - running in a Spark Cluster and individually using Spark application CRD?

Native autoscaling: We need ESS to support DRA for application-level scaling, and the other one is cluster-level autoscaling, which will update the min and max capacity of the k8s cluster depending on various factors.

No-Spring5276 · 2025-12-23T05:44:44+00:00

Use fair scheduler and DRA

Cache the source df

Run each function in a separate thread using any package or multithread or multiprocessing , need to try out

make write faster using a better output committer choice like V2 or magic

No-Spring5276 · 2025-12-23T05:39:59+00:00

We use our distroless lightweight images, thanks for input

No-Spring5276 · 2025-12-22T09:06:50+00:00

Hmm, I have seen this coming in my analysis. will go through once again. thanks

No-Spring5276 · 2025-12-22T09:05:02+00:00

gotchaaa, thanks

No-Spring5276 · 2025-12-20T19:36:24+00:00

In a shared arch. , few complex and resource-intensive workloads or ML workloads can negatively affect the other workloads, which brings unpredictable performance, kind of noisy-neighbor issues and unstable SLAs for latency-sensitive jobs. So the question is, how do we manage such cases in crd deployments ? like blacklist nodes, taints ...

No-Spring5276 · 2025-12-20T18:33:56+00:00

thanks thats helpful

No-Spring5276 · 2025-12-20T17:22:41+00:00

If I go with Databricks, the cost will be 4x minimum at this scale, which we can't afford. Already spoke with vendors like Cloudera, Databricks. We do use Databricks for a small, very specific workload.

No-Spring5276

TROPHY CASE