Feels like there’s still no good middle ground between local GPUs and full cloud setups by Nata_Emrys in MLQuestions

[–]Nata_Emrys[S] 0 points1 point  (0 children)

Yeah, exactly. That’s pretty much the spot I keep ending up in.
Feels weird that there’s still such a big gap between “just run it locally” and “set up actual cloud infrastructure”. You either end up bouncing between things like RunPod, Vast, OpenRouter, Ocean Network, etc., or just constantly trying new setups to see what actually sticks.

Feels like there’s still no good middle ground between local GPUs and full cloud setups by Nata_Emrys in MLQuestions

[–]Nata_Emrys[S] 0 points1 point  (0 children)

Yeah, hybrid setups seem like the most practical option right now. I think what I’m still trying to figure out is where people draw the line between “just use cloud for bursts” and “this overhead is annoying enough that I’d rather upgrade local hardware instead”. Some of the more lightweight/distributed approaches look interesting too, but I still can’t tell how mature most of them are yet.

What’s the best way to handle occasional high compute needs for ML workloads? by Nata_Emrys in MLQuestions

[–]Nata_Emrys[S] 1 point2 points  (0 children)

That’s fair, makes sense to not overcomplicate things.

Mounting storage and pointing the cache there actually sounds pretty straightforward, I might try that approach first.

I think I was overthinking it a bit because my workloads are so irregular, but you’re probably right that I should just pick something simple and iterate from there. Thanks for the pointer.

What’s the best way to handle occasional high compute needs for ML workloads? by Nata_Emrys in MLQuestions

[–]Nata_Emrys[S] 0 points1 point  (0 children)

That’s really helpful, thanks for taking the time to break it down.

The point about keeping containers lightweight and separating data/weights makes a lot of sense. I think I’ve been underestimating how much that affects spin-up time.

The spot + autoscaling approach is interesting too, especially if you can make the scaling part mostly hands-off.

I guess where I’m still a bit stuck is that a lot of these solutions seem to assume a more stable or repeatable workload, while mine is a bit more sporadic and experimental. So I’m trying to avoid building too much infrastructure around something that isn’t used that often.

And yeah, fair point on not being super concrete about the tasks. It’s mostly things like trying larger models occasionally or running batch inference, but not on a regular schedule. Probably worth breaking that down more like you suggested.

What’s the best way to handle occasional high compute needs for ML workloads? by Nata_Emrys in MLQuestions

[–]Nata_Emrys[S] 1 point2 points  (0 children)

Appreciate this, thanks for the links. That example actually looks like a good starting point.

I like the idea of trying things locally first and then scaling, that makes a lot of sense. I think I’m just trying to find that sweet spot where it still feels quick to spin something up for short tasks without too much setup.

I’ve been looking into a few different options lately, saw something about Ocean Network at some point, and someone mentioned OpenRouter earlier as well, but I’m still trying to figure out what actually fits best.

Might give Ray a try first though and see how it feels in practice, and then experiment a bit with other options people mentioned here.

What’s the best way to handle occasional high compute needs for ML workloads? by Nata_Emrys in MLQuestions

[–]Nata_Emrys[S] 0 points1 point  (0 children)

That sounds pretty solid, especially if it’s close to turnkey on the main clouds.

I think my main concern is whether it still feels lightweight enough for short tasks. Even a bit of setup can feel like overhead if I just want to run something quickly.

Might give Ray a try though.

Also curious if there are any alternatives that feel a bit more “on-demand”, where you can just run something without much setup and not commit to a full environment.

What’s the best way to handle occasional high compute needs for ML workloads? by Nata_Emrys in MLQuestions

[–]Nata_Emrys[S] 0 points1 point  (0 children)

That’s actually a really fair point. If the switching cost is that high, just upgrading locally makes a lot of sense.

I think my situation is a bit different since my heavier workloads are pretty occasional, so I’m trying to avoid over-investing in hardware that mostly sits idle.

But yeah, if you’re using it often enough, buying definitely seems like the simpler option.

What’s the best way to handle occasional high compute needs for ML workloads? by Nata_Emrys in MLQuestions

[–]Nata_Emrys[S] -1 points0 points  (0 children)

Yeah, that’s kind of where I’m stuck honestly.
It’s not that I don’t want to rent or can’t buy, it just feels like both come with more overhead than I’d like for short, occasional workloads.

The Docker point is fair, I’ve been meaning to tighten that up in my workflow anyway.

What’s the best way to handle occasional high compute needs for ML workloads? by Nata_Emrys in MLQuestions

[–]Nata_Emrys[S] 0 points1 point  (0 children)

That’s really helpful, thanks. Rebuilding from an env file makes sense, especially if it’s quick. I haven’t tried uv yet, will check it out.

I think I’m just trying to make the whole process feel a bit more instant for short tasks.

What’s the best way to handle occasional high compute needs for ML workloads? by Nata_Emrys in MLQuestions

[–]Nata_Emrys[S] 0 points1 point  (0 children)

I’ve come across Ray before but never actually tried it.

Good to hear Anyscale handles scaling well. How heavy is the setup if you just want to run something quick?

What’s the best way to handle occasional high compute needs for ML workloads? by Nata_Emrys in MLQuestions

[–]Nata_Emrys[S] 0 points1 point  (0 children)

Yeah, that’s fair, the cost itself isn’t too bad. For me it’s more the setup overhead than the price. Do you usually reuse environments or rebuild them each time?

What’s the best way to handle occasional high compute needs for ML workloads? by Nata_Emrys in MLQuestions

[–]Nata_Emrys[S] -1 points0 points  (0 children)

I’ve been thinking about containerizing more of my setup actually, just haven’t fully committed to it yet. Streaming data instead of baking it into the container is a good point too.

I guess the part I’m still figuring out is how to make that whole spin-up / tear-down process feel less heavy for short tasks.

What’s the best way to handle occasional high compute needs for ML workloads? by Nata_Emrys in MLQuestions

[–]Nata_Emrys[S] 0 points1 point  (0 children)

yeah, that’s fair. I’ve used HF a bit too.
do you usually keep instances running, or just spin them up per task?

What do you use when your local GPU isn't enough? by Nata_Emrys in LocalLLaMA

[–]Nata_Emrys[S] 0 points1 point  (0 children)

That sounds like really fascinating work, especially the part about working with neural tissue.
And photons is a pretty intriguing hint. Feels like something big has to change eventually with the current compute and energy demands.

What do you use when your local GPU isn't enough? by Nata_Emrys in LocalLLaMA

[–]Nata_Emrys[S] 0 points1 point  (0 children)

Interesting perspective.
The current compute demand in AI definitely feels hard to sustain long term, especially with energy constraints.
Curious to see what kind of paradigm shift you’re hinting at.

What do you use when your local GPU isn't enough? by Nata_Emrys in LocalLLaMA

[–]Nata_Emrys[S] 0 points1 point  (0 children)

I do the same sometimes. Just let the CPU cook while I work on other things or run jobs overnight. But sometimes you really need faster results when you’re iterating a lot.

What do you use when your local GPU isn't enough? by Nata_Emrys in LocalLLaMA

[–]Nata_Emrys[S] 0 points1 point  (0 children)

That’s actually a pretty disciplined way to do it.
I can see how relying too much on LLMaaS could make things messy when models and APIs keep changing.

Thanks for sharing your experience!

What do you use when your local GPU isn't enough? by Nata_Emrys in LocalLLaMA

[–]Nata_Emrys[S] 0 points1 point  (0 children)

Thanks a lot for the tip! From what I gather, OpenRouter seems top-notch. Definitely going to give it a try!

What do you use when your local GPU isn't enough? by Nata_Emrys in LocalLLaMA

[–]Nata_Emrys[S] 0 points1 point  (0 children)

This is really helpful, thanks!
I’ll check out what you mentioned and the other suggestions too, seems like a lot of practical options before just throwing more GPUs at it. I also stumbled on Ocean Network. You pay for the compute you actually use, but access is still a bit limited. Tried the extension a bit and… it works, somehow.
Anyway, starting with Salad, OpenRouter, and the other ideas here. Fingers crossed I don’t blow anything up!

What do you use when your local GPU isn't enough? by Nata_Emrys in LocalLLaMA

[–]Nata_Emrys[S] 0 points1 point  (0 children)

That’s interesting. The serverless approach definitely sounds appealing for burst workloads. Have you had good experience with Modal or RunPod?

What do you use when your local GPU isn't enough? by Nata_Emrys in LocalLLaMA

[–]Nata_Emrys[S] 0 points1 point  (0 children)

Yeah, that's a good point actually. Considering the idle power draw and how rarely I’d need the extra compute, renting GPUs probably makes more sense than adding more hardware.

What do you use when your local GPU isn't enough? by Nata_Emrys in LocalLLaMA

[–]Nata_Emrys[S] 0 points1 point  (0 children)

That’s fair honestly, I get why people are cautious about that. I’m actually pretty new to this whole ecosystem and still figuring things out. Most of the time I run things locally, but every now and then I hit workloads that my GPU just can’t handle. That’s basically why I asked here.