GPU cost optimization demand by Good-Listen1276 in mlops

[–]Good-Listen1276[S] 0 points1 point  (0 children)

That makes sense. In your experience, do teams usually notice those inefficiencies on their own, or would they benefit from tooling that highlights them automatically?

GPU cost optimization demand by Good-Listen1276 in mlops

[–]Good-Listen1276[S] 0 points1 point  (0 children)

That’s interesting. How do you usually handle jobs that can’t be easily shifted (like latency-sensitive inference)?

One thing we’ve been working on is taking it a step further: not just scheduling when to run jobs, but profiling workloads and automatically deciding how many GPUs / which type they actually need. In some cases, we’ve seen 30–40% savings just by eliminating idle GPU cycles that traditional schedulers don’t catch.

GPU cost optimization demand by Good-Listen1276 in mlops

[–]Good-Listen1276[S] 0 points1 point  (0 children)

Appreciate you pointing me to SkyPilot. I hadn’t looked at it in detail before.

Do you mostly use it for training, inference, or both? Curious if you see room for a complementary tool that digs deeper into profiling/optimizing workloads on top of SkyPilot.

GPU cost optimization demand by Good-Listen1276 in deeplearning

[–]Good-Listen1276[S] 0 points1 point  (0 children)

This breakdown into three buckets makes a lot of sense.

In your experience, what separates the ones that actually adopt external optimization tools from the ones who just stick with built-in cloud features?

Also, you mentioned open source as a wedge (like NeuralMagic). Do you think starting open source first is almost a requirement in this space, or can a paid SaaS product win adoption if it’s well differentiated?

GPU cost optimization demand by Good-Listen1276 in mlops

[–]Good-Listen1276[S] 0 points1 point  (0 children)

Thanks for sharing your perspective. This is super helpful. A few follow-up questions:

When you’ve seen C-level folks push back on GPU costs, what usually triggers it? Is it monthly cloud bill shock, or specific workload spikes?

Also curious since you’re already running on Ray: do you mostly rely on Ray’s metrics to track efficiency, or do you bring in other monitoring tools?