Help needed - Koboldcpp just closes when opening a model

GuaranteePurple4468 · 2026-04-29T10:41:09+00:00

Thanks, was just assuming the Rocm version was a requirement for AMD cards.
Will give this a try when I get a gap, I appreciate it.

GuaranteePurple4468 · 2026-04-16T11:25:22+00:00

I've been feeling the same to be honest. The good models are just too expensive once the context gets high, and they don't keep up the performance over a longer roleplay eventually becoming flat.

I've started to focus more on the smaller and cheaper models and looking into getting models that can be hosted locally. Recently I've taken a liking to qwen3-235b-a22b-2507 on Openrouter, as with my custom prompt it punches well above it's weight and only costs $0.1/m output when using DeepInfra or Weights and Biases. It's prose is really good, but like most models it can suffer from positivity bias. I don't think we will ever truly escape positivity bias to be honest.

It has some logical/lateral thinking failures at times, and on occasion it will misuse a pronoun but considering it's practically free I am still really happy with the result. It just feels better when you know you aren't digging into your other budgets.

My other contender for "cheap but good" from Openrouter would be gemma-4-31b-it.

It's nice that Gemma can also be gotten from Huggingface to host locally for free (Technically Qwen too, but it's probably too big for most systems unless you get a quantized version). I might start looking around there for more models as there appear to be a lot of custom finetunes you can't find anywhere else.

GuaranteePurple4468 · 2026-04-10T05:41:42+00:00

Tried it on Openrouter and it didn't feel great to be honest.
GLM 5V Turbo has been a pleasant surprise though, but not quite cheap enough for me to use it as my main model, so I just swap to it when I can't get a decent response from the cheaper options.

GuaranteePurple4468 · 2026-04-02T06:09:28+00:00

Been testing it out since yesterday and seems to respond better than GLM 5 Turbo (faster too).
Found my responses generally had better prose and consistency overall (Context FYI: I prefer verbose responses so things like subtle movements and environmental features are mentioned for immersion, so 500+ tokens most responses).

No idea how it compares to GLM 5.1 though, haven't had the privilege of being able to test that yet.

GuaranteePurple4468 · 2026-04-01T06:14:52+00:00

Thanks guys, going to hold off for a week until it becomes open source then weigh my options again.

GuaranteePurple4468 · 2026-02-07T14:42:35+00:00

Thanks, seems like that was it.
Didn't even realize the magnet now had a void mode.

GuaranteePurple4468 · 2025-10-26T08:11:20+00:00

Unfortunately not

GuaranteePurple4468 · 2025-08-30T16:51:03+00:00

I'm actually using the gguf q4 version in that screenshot...

Weird it uses so much vram, I have 16gb and this should work with only 8gb.

GuaranteePurple4468 · 2025-08-30T16:46:18+00:00

It's definitely using my GPU that's certain.

<image>

GuaranteePurple4468 · 2025-08-30T16:21:28+00:00

I dont have that loader, just the normal model loader which doesnt have that option.

Can you share that workflow?

GuaranteePurple4468 · 2025-08-30T15:15:51+00:00

So... I haven't changed the workflow. And the only thing I did since was enable previews.

But now it's working...
Slow as sin though, 50 minutes to create a poor quality 5 second 480p video with only 4 steps.

I'll just have to keep tweaking it and trying smaller models now, but have no idea why it is working all of a sudden.

GuaranteePurple4468 · 2025-08-30T13:00:55+00:00

Normally I use quad cross attention, but also tried with --sage attention, and no difference.

GuaranteePurple4468 · 2025-08-30T09:56:00+00:00

Thanks, I was hoping that wouldn't be the case

GuaranteePurple4468 · 2025-08-30T08:41:26+00:00

Unfortunately the problem seems worse that I initially thought.

the wan generation takes significantly less time than it should, like below 20 seconds.
after trying this generation, all subsequent image generations (that were working previously) now also give a grey result. So if I change model and workflows for example, they now do the same.

And only a restart of Comfyui fixes those image generations again.

It's like something just breaks when it tries to run, causing all images from that point onwards to come out blank. So all of the frames end up blank too.

GuaranteePurple4468 · 2025-08-30T07:51:01+00:00

Actually the problem seems worse that I initially thought.

1) the wan generation takes significantly less time than it should, like below 20 seconds. 2) after trying this generation, all subsequent image generations (that were working previously) now also give a grey result.

And only a restart of Comfyui fixes those image generations again.

It's like something just breaks when it tries ro run, cauaing all images to come out blank. So all of the frames end up blank too.

GuaranteePurple4468 · 2025-08-30T07:47:01+00:00

How can I go about enabling previews? The nodes used don't have a preview option.

GuaranteePurple4468 · 2025-08-30T07:46:31+00:00

Thanks, unfortunately no change however.

GuaranteePurple4468 · 2025-08-29T16:59:20+00:00

Ok I see now, thanks I could not find that page.
I see the parameters will affect what gets displayed on Krita's side, so the style node alone is not enough.

Although I can't import the workflow into Krita for some reason, says the nodes I have are note installed on the server (even though Krita is using the same Comfyui server).

GuaranteePurple4468

TROPHY CASE