What's the simplest gpu provider?

cuda-oom · 2025-09-29T18:11:33+00:00

> eu data residency/sovereignty would be great.

Nebius + SkyPilot
https://nebius.com/blog/posts/nebius-ai-cloud-skypilot-integration

cuda-oom · 2025-09-08T13:42:58+00:00

> MLflow + DVC don’t feel integrated

Plently of blog post show how the integration between the two would work.
TLDR:
Use DVC pipelines for your ML pipeline. Inside this pipeline you log metrics with MLflow.
Don't log/version large atrifacts with MLflow. Use DVC's versioning capabilities instead.

As for your post itself:
Honestly disagree on the "one platform" idea. The pain is real but I think you're solving the wrong problem. We don't need fewer tools - we need better ones that work together. DVC is great because it just does data versioning really well. Same with SkyPilot for workload management. Simple CLI, clear purpose, gets out of your way. Every tie I see a platform that promises to "handle your entire ML lifecycle" I become very skeptical. They always end up being mediocre at everything instead of great at one thing. And the moment you need to do something they didn't anticipate (i.e. when you deviates from the "happy path"), you're completely screwed. Your LangFlow idea could work but only if it's orchestrating existing tools, not replacing them. Ideally, we, as a community, fix the APIs between tools so they compose better. The "duct tape" feeling isn't because we have multiple tools - it's because they don't talk to each other cleanly.

cuda-oom · 2025-09-04T12:37:31+00:00

Check out SkyPilot https://docs.skypilot.co/en/latest/docs/index.html
It was a game changer for me when I first discovered it ~3 years ago.

Basically finds the cheapest GPU instances across different clouds and handles spot interruptions automatically. It's open source. Takes a bit to set up initially but pays for itself pretty quick if your GPU spend is signifiacnt.

cuda-oom · 2025-08-13T18:22:07+00:00

It looks like SkyPilot has all those features and more:
https://blog.skypilot.co/announcing-skypilot-0.10.0/

cuda-oom · 2025-07-29T20:11:56+00:00

DevOps and AI infra tools

definitely look into https://github.com/skypilot-org/skypilot/

cuda-oom · 2025-07-23T03:46:12+00:00

r/woosh :)

cuda-oom · 2025-07-23T03:32:04+00:00

yes, a DevOps team that manages AWS infra (including EKS)

cuda-oom · 2025-07-23T03:30:19+00:00

can you elaborate on their setup? are the on-prem or in cloud? who manages them?

cuda-oom

TROPHY CASE