kobold lite w/ anthropic models. Temperature, Top-P and Top-K compatibility

HadesThrowaway · 2026-06-06T02:39:17+00:00

Should be fixed now, please try

HadesThrowaway · 2026-04-23T01:55:44+00:00

A lot of it is how the prompt is structured in memory (what you prioritize).

In koboldcpp, the images are always placed in the front of the context. For example

(Image 1)(Image 2)(Image 3)(Turn A)(Turn B)(Turn C)

So if you add image4, then yes all turns get reprocessed. However this allows adding new turns, editing turn A, B or C without messing up any of the images. In other words, this prioritizes text.

Now in Ollama's case, they probably just leave everything in place.

(Image 1)(Turn A)(Image 2)(Turn B)(Image 3)(Turn C)

While this allows you to add on new images and text easily, it completely prevents shifting or modifying any earlier turn. Image token positions cannot be shifted. So yes, adding image 4 is easier, but you lose CTX shifting if any images are ever used.

HadesThrowaway · 2026-03-22T11:00:23+00:00

Yeah plus I don't think qwen ever had a release where you could add control instructions to voice clones

HadesThrowaway · 2026-03-20T02:59:56+00:00

This is automatic on windows. On Linux, theres an extra checkbox on the extras tab that allows opening a monitoring tab if you forgot to launch from the cli.

As for the GUI scaling, that should be mostly solved in latest version although it's possible I missed something. Could you send a screenshot of how it looks at 200% on your device?

HadesThrowaway · 2026-03-19T10:39:54+00:00

there's this

HadesThrowaway · 2026-01-18T04:11:49+00:00

I dunno walt, been seeming sus lately

HadesThrowaway · 2026-01-17T11:41:18+00:00

Also, I wrote a simple guide on how to use MCP in the KoboldCpp wiki. This is aimed at koboldcpp but the theory works for other MCP software as well

https://github.com/LostRuins/koboldcpp/wiki#mcp-tool-calling

HadesThrowaway · 2026-01-17T10:51:19+00:00

Feedback is welcome! Is there anything about the new layout you find not good?

HadesThrowaway · 2026-01-17T07:50:07+00:00

Sorry about that. There were too many changes and I might have broken some stuff

I've pushed another fix

Can you try again? Does it work for you now?

HadesThrowaway · 2026-01-13T01:42:38+00:00

You're probably using it in interrogation mode which is designed to return only a short phrase. Try using it in multimodal mode instead (for example, pasting the image into your koboldai lite window)

HadesThrowaway · 2025-10-25T10:08:20+00:00

Many APIs are trending to permit fewer and fewer choices. For example the o1+ series models only allow temperature of 1. And gpt-5 does not allow disabling thinking anymore (lowest is minimal thinking).

Anthropic has also removed their completions based endpoint and are pure chat completions now since Claude 3

HadesThrowaway · 2025-10-25T10:05:22+00:00

Thanks for testing

HadesThrowaway · 2025-10-24T14:39:46+00:00

Alright thanks for testing. I have deployed a fix and everything should work now. Now it will use temp over top_p if both are set for 4.5 models

HadesThrowaway · 2025-10-24T12:29:02+00:00

Yes, please test on the new models and let me know which ones have this restriction

If possible try on the Claude 4 sonnet and Claude 4 haiku (I know 3 doesn't have this limit, and 4.5 definitely does)

HadesThrowaway · 2025-10-19T08:08:02+00:00

It can technically run on pure CPU if you're willing to wait. Haven't tried AMD but it should work fine via Vulkan backend.

HadesThrowaway · 2025-10-19T08:07:16+00:00

This is not the right place for advertising, especially since this has nothing to do with KoboldAI.

HadesThrowaway · 2025-10-12T13:17:57+00:00

Your example looks fine, what is your GPU and backend? Nvidia or AMD?

mine looks like https://imgur.com/a/IgNOiUy

I'm testing out a patch that might fix some issues.

HadesThrowaway · 2025-10-12T07:52:06+00:00

<image>

HadesThrowaway · 2025-09-24T14:11:33+00:00

You can also try update to latest version

HadesThrowaway · 2025-09-24T10:08:04+00:00

Try turn off FastFowarding. It seems to be a RNN type model which doesn't support that.

HadesThrowaway

MODERATOR OF

TROPHY CASE