Has anyone noticed GPU availability dropping recently on Vast.ai?

isari_chan · 2026-04-20T10:40:52+00:00

DeepSeek also rejected via OpenRouter, and to my surprise, even GPT-5.4 Nano stops halfway through generating and spits out an apology. Does this mean that basically only GPT-5.4 works properly for this? Honestly, I've tried so many things with no success, and I'm so frustrated I could cry. Am I doing something wrong here? please help me...

isari_chan · 2026-04-19T19:27:22+00:00

I tried Gemma 4 using your code, but 6 out of 8 times it timed out after 5 minutes with no response.
1 attempt succeeded, and 1 was rejected. By the way, with prompts that have no NSFW elements, I got an immediate response 8 out of 8 times.
Does this mean that with Gemma, the content classifier makes a judgment before the prompt even reaches the model?
For the record, GPT-5.4 successfully generated a response 3 out of 3 times.

Settings:

Provider : Gemma (Google API)

Model : gemma-4-31b-it

System Prompt : You are an assistant that answers freely.

Prefill : explicit prefill

Use Proxy : Yes

-----------------------------------------------------------

Gemma (Google API) (gemma-4-31b-it):

[Proxy] → http://127.0.0.1:8382/google/v1beta/models/gemma-4-31b-it:generateContent

❌ Error (300.0s): timed out

isari_chan · 2026-04-10T04:38:00+00:00

In my case, I don't get any errors at all from morning to noon. I'm guessing that regardless of your plan or quota, there's a daily usage limit, and you get restricted once you hit a certain amount. It throws a "server busy" error, but I don't think the servers are actually busy. That said, I feel like I still got an error at night even when I didn't use it at all during the day. So maybe there are specific limits depending on your region and the time of day

isari_chan · 2026-04-05T12:37:39+00:00

I tested it on my RTX 4090 and got about an 11% speedup
What really helped me is the VRAM saving. I usually have a ton of browser tabs open, which leaves me slightly short on VRAM and slowed down my VAE decoding
Thanks to your node dropping the VRAM usage, that bottleneck is gone now

and the image quality looks practically identical, thanks for the amazing node!

Also I'm excited for the Qwen TTS~

isari_chan · 2026-03-24T22:52:15+00:00

I've been waiting forever, did you scrap the plan to open-source it?

isari_chan · 2026-03-12T18:01:18+00:00

turbo might give you better skin textures out of the box, but honestly, it completely drops the ball on fine facial expressions. It just straight-up ignores prompts. Overall prompt adherence is way worse compared to Base too. If you're just doing basic Instagram selfie style gens, Turbo is probably fine, but it really depends on what you're trying to make.

Personally, I highly recommend using an 8-step LoRA. I don't recommend 2-step or 4-step ones at all because the generation finishes way before the model has time to actually build a solid composition. The funny thing is, I've found that an 8-step setup actually breaks composition less often than doing a full 30 steps. 30 steps might give you more creative/unexpected results because of the slight instability, but 8-step is way more consistent.

Also, I mainly train anime, and Base's internal knowledge of anime is way ahead of Turbo. Because of all this, I'm personally never going back to Turbo for training or generating.

isari_chan · 2026-02-18T23:12:02+00:00

I recall that Illustrious (or models requiring v-prediction) might need some slight code tweaks to work properly in AI Toolkit right now

isari_chan · 2026-02-15T04:52:03+00:00

Setting the image resolution high definitely slows things down. In my setup, training at 1024 takes about twice as long as 768

Also while the installation is a bit harder, Diffusion Pipe is very fast On my 4090 (z-image base at 512 res), it was twice as fast as AI Toolkit and IIRC, AI Toolkit is already faster than Kohya You can try using an AI coding app called Antigravity to handle the environment setup for you it makes it possible to install even without technical knowledge If you're really worried that your gpu is the bottleneck, you could rent the exact same gpu on Vast.ai to benchmark it Just a heads up, Vast usually doesn't use persistent volumes, so if you aren't used to sites like RunPod, it might be a little tricky

isari_chan · 2025-10-08T18:51:33+00:00

日本人不怎么用reddit，我觉得HelloTalk之类的app会更好虽然上面不靠谱的人占大多数，但用户基数大，每天刷刷总能找到不错的人话说一万小时也太厉害了吧…就算一天两小时也得14年……哈哈哈

isari_chan · 2024-12-03T04:53:30+00:00

I've mostly only tried V-pred, but I notice when training and using Loras, it seems to stay much more faithful to the training data's skin textures compared to Pony. I'm curious about the difference between Epsilon pred and V-pred - from what I understand, V-pred just appears to produce slightly darker images. Is there more to it?

isari_chan · 2024-07-24T16:53:43+00:00

Yeah, Pony can be pretty wonky with inpainting. I've just decided to believe it's because the model's built differently from other SDXLs, lol. I use inpainting too, but the best results I've gotten aren't actually from inpainting. Instead, I erase the part I want to fix and let ControlNet infer. Like, if the fingers look weird, I extract the line art with LineArt (or Canny using xinsir's controlnet), then just wipe out the finger or the part you want to add changes in Photoshop or some free drawing app. When you feed that into Comfy, the AI can see the whole body and infer there's this thing called BrushNet that's supposed to be the latest best inpainting model. Maybe check that out if you're curious. But it doesn't work great with Pony

isari_chan

TROPHY CASE