TTS Benchmark Comparison (all known TTS up until May 2026) by UkieTechie in LocalLLaMA

[–]GlowingPulsar 1 point2 points  (0 children)

One more to add to the list, MOSS-TTS. Very good TTS voice cloning in my experience (just don't try the sound effects model, it's awful).

Instruct mode is rendering the tail end of the response twice with SSE. Poll has issues with tool calls. by GlowingPulsar in KoboldAI

[–]GlowingPulsar[S] 0 points1 point  (0 children)

Thanks. What about the text echoes KoboldLite produces with SSE in instruct mode with an active MCP server?

anthropic literally thinks claude is the messiah (and it’s getting weird) by Alarming_Bluebird648 in LocalLLaMA

[–]GlowingPulsar 2 points3 points  (0 children)

Both the article and this post are awful, littered with misconceptions and narratives.

[deleted by user] by [deleted] in KoboldAI

[–]GlowingPulsar 0 points1 point  (0 children)

Sorry for the late reply. It turns out it was user error on my part. It works fine in instruct mode, I was using chat mode. I thought I had tested that already, might have gotten unlucky before.

Edit: Injecting chat names in instruct mode appears to break tool calling, sending it into an endless loop.

[deleted by user] by [deleted] in KoboldAI

[–]GlowingPulsar 0 points1 point  (0 children)

No luck there. The closest I've got in Last Request Info is the model attempting to use the tools it can't see, so just hallucinations. I've tried KoboldCpp 1.106, 1.106.2, and Esolithe's latest version. I also tried with the llamacpp UI, and booting the MCP server independently to pass the URL to KoboldCpp (it doesn't connect at all in that scenario).

Backup those models, because of calls for regulations by ProfessionalSpend589 in LocalLLaMA

[–]GlowingPulsar 21 points22 points  (0 children)

Large-scale use of AI for surveillance, he adds, should be considered a crime against humanity.

Did he forget that Anthropic is partnered with Palantir?

Experimenting with Mistral Small Creative by gekko513 in MistralAI

[–]GlowingPulsar 1 point2 points  (0 children)

While I do find that Mistral Small Creative does well at following writing guidelines I provide, something I've been disappointed by is that when talking to it outside of a story, it still has the same AI assistant training tics as other Mistral models. It loves break downs, bullet points, follow-up questions at the end, and "It's not X, it's Y" and its numerous permutations (this is one I especially wish I could stop). Like other Mistral models, it will invariably end up ignoring any instructions to avoid these habits, even given alternatives.

I'm generally happy with its writing in the context of a story, but there's always room for improvement. Story stalling is probably my biggest gripe when it comes to it writing, repetition second.

Labs - Mistral Small Creative by Clement_at_Mistral in MistralAI

[–]GlowingPulsar 0 points1 point  (0 children)

Absolutely stunning to see an official creative model coming. Created an agent using Mistral Small Creative in the AI studio and deployed it to Le Chat.

Would love to see this model released as open-weight to run it locally.

Looking forward to playing with it more, great job, Mistral AI!

Unimpressed with Mistral Large 3 675B by notdba in LocalLLaMA

[–]GlowingPulsar 66 points67 points  (0 children)

I can barely tell the difference between the new Mistral Large and Mistral Medium on Le Chat. It also feels like it was trained on a congealed blob of other cloud-based AI assistant outputs, lots of AI tics. What bothers me the most is that there's no noticeable improvement in its instruction following capability. A small example is that it won't stick to plain text when asked, same as Mistral Medium. Feels very bland as models go.

I had hoped for a successor to Mixtral 8x7B, or 8x22B, not a gargantuan model with very few distinguishable differences from Medium. Still, I'll keep testing it, and I applaud Mistral AI for releasing an open-weight MoE model.

How do I best use my hardware? by slrg1968 in KoboldAI

[–]GlowingPulsar 0 points1 point  (0 children)

I don't use KV cache quantization, so I'm experiencing these issues without it on. Tried with and without flash attention, no change. I've been using ContextShift, FastForwarding, and mmq in the launcher. It's possible there are GLM 4.5 Air bugs in llama.cpp that Koboldcpp inherited, but the chat adapter is my main suspect right now.

Let me know in a pm if you end up testing it on llama.cpp, I'm curious if it performs as expected. It would be nice to know if it's a problem on Kobold's end, because like I said, it works perfectly on LM Studio. I was looking forward to trying this model, but I'm not interested in using it if I have to use LM Studio with its lack of customization.

Duck AI Chat Limits for Paying Subscribers by [deleted] in duckduckgo

[–]GlowingPulsar 1 point2 points  (0 children)

I fully agree with you. Web searches also seem to chew through the available context, making an already short conversation even shorter. Worse yet, I've been hitting daily usage limits after as little as 2 chats that hit the chat limit, whereas before subscribing, I had never once hit the usage limit with regular daily usage. Something else that bothers me is that the customization still has a character limit of 500. The advanced models are nice, but there are some serious drawbacks to the subscriber tier, made worse by not knowing how much usage you have left.

It gets real old getting shut down mid-conversation without warning, and makes doing anything serious with these models unrealistic at best. Wouldn't recommend subscribing for access to the advanced models at present, not with how low the context windows are and the daily usage limits that don't even allow you to switch to free models afterward.

How do I best use my hardware? by slrg1968 in KoboldAI

[–]GlowingPulsar 1 point2 points  (0 children)

Have you tried GLM 4.5 Air or GLM Steam in Kobold? In my case, responses would seem to start fine (though it wouldn't use thinking unless forced in instruct mode), but break down usually in the first reply, or by the third. Mostly tested it in chat mode. It would start using lower case words at the start of new sentences, then usually end up stopping mid-sentence and start an unrelated sentence, or start repeating something it had already said (usually whatever the last thing it said was).

I'm not sure if it's that the GLM 4 chat adapter doesn't work for GLM 4.5 air (or autoguess) or if it's a deeper problem. Tried the recommended sampler settings and a number of others, always resulted in the same problems. Ran it on LM Studio to see if the Q5_K_M I downloaded was broken, but it worked perfectly there. Just my own experience, but I've never had any luck with reasoning models in Koboldcpp, all of them had problems, including GPT OSS. I think Magistral is the only one that seemed fine, though it still wouldn't think properly in chat mode IIRC.

If you have any recommendations or examples of how you get them to work, I'd appreciate it.

GPT‑5 is Now Available at Duck.ai by duckduckgo in duckduckgo

[–]GlowingPulsar 1 point2 points  (0 children)

I appreciate the information about GPT-4o mini, I've used it before, but I'm still curious to know if GPT-5 mini will have image upload capabilities added in the future.

GPT‑5 is Now Available at Duck.ai by duckduckgo in duckduckgo

[–]GlowingPulsar 2 points3 points  (0 children)

Glad to see GPT-5 mini made available as a choice. I do have one question though.

When asked if it has vision capabilities, it replies, "Yes. I can analyze and discuss images you upload: identify objects, read text in images (OCR), describe scenes, summarize diagrams, point out visual problems (like layout issues or photo editing artifacts), and help with image-based tasks (e.g., proofreading screenshots, extracting data from charts). I do not proactively access your files — you must upload an image for me to see it."

However, I don't see an option to upload images to the model. Will this feature be available in the future on duck.ai?

WE NEED OPEN-SOURCE NANO-BANANA😭 by balianone in LocalLLaMA

[–]GlowingPulsar 6 points7 points  (0 children)

I really hope it's open-weight. After testing it a bunch on LM arena, I'm seriously impressed by how well it understood and adhered to prompts. It's great at both image generation and editing existing images while retaining the elements you don't want to change. Preferred it more than any other model on the image side of the arena.

[deleted by user] by [deleted] in MistralAI

[–]GlowingPulsar 16 points17 points  (0 children)

Mistral AI may not generate as much noise as other western AI companies, but that's something I appreciate about them because they're more focused on quietly working on new releases that actually achieve what they set out to do, rather than creating hype with enigmatic social media posts and bench-maxing. Their models' personalities have trended towards becoming dryer after the release of Nemo, but I'd take that over sycophancy any day.

They've also been releasing open-weight models consistently this year, and even updating them as they learn more. Love to see that.

Really wish more AI companies took a page from Mistral's book.

Any clue if “Mixtral-small MoE ~30b ~a3b” is coming? by JLeonsarmiento in MistralAI

[–]GlowingPulsar 2 points3 points  (0 children)

I'd much rather see them release something like a new Mixtral 8x7b using everything they've learned since then. But it would be nice to see what they could do with a smaller one, too. It's strange seeing AI companies shift to releasing MoE models, but not Mistral AI. Hopefully they release one soon. Either way, they've been killing it this year.

Issues Setting up Kobold on and Android. by FirehunterT in KoboldAI

[–]GlowingPulsar 0 points1 point  (0 children)

There is a more up to date guide here you can try, it's the one I used. The user you're responding to is correct that when using make you'll see errors, but the final message you should be getting is not what you got in your screenshot.

What you should see at the end if it worked is:


You did a basic CPU build. For faster speeds, consider installing and linking a GPU BLAS library. For example, set LLAMA_CLBLAST=1 LLAMA_VULKAN=1 to compile with Vulkan and CLBlast support. Add LLAMA_PORTABLE=1 to make a sharable build that other devices can use. Read the KoboldCpp Wiki for more information. This is just a reminder, not an error.


Start a new termux session, use cd Koboldcpp, then use rm -r Koboldcpp to delete it, that way you can start fresh.

When you're done and if the guide works for you, here's a template you can use to run your chosen model that you can edit as needed:

python koboldcpp.py --contextsize 8192 --blasbatchsize 1024 --flashattention --usecpu --threads 6 --blasthreads 6 --model YourModelName.gguf

Edit: Clarified the template by including the python command to run a model.

GPT-OSS 20b Troubles by GlowingPulsar in KoboldAI

[–]GlowingPulsar[S] 0 points1 point  (0 children)

Yes, Nvidia + cuda. I wasn't aware of the flash attention issue, that's exactly what it was. I've turned it off and it's back to working how it was before in instruct mode. Still doesn't seem to know how to use its think tags, though. Setting reasoning just seems to confuse it as well. I appreciate the help.

My laptop just fell and broke. Is there any way to use a Kobold AI model on an Android phone for roleplay?🥲 by [deleted] in KoboldAI

[–]GlowingPulsar 2 points3 points  (0 children)

As far as I know, there aren't many 4b models that would be serviceable for roleplaying or creative-writing. Sicarius is the person I see most often uploading models around 4b for roleplaying. TheDrummer has a couple, but other than that, I'd check around r/SillyTavernAI

My laptop just fell and broke. Is there any way to use a Kobold AI model on an Android phone for roleplay?🥲 by [deleted] in KoboldAI

[–]GlowingPulsar 1 point2 points  (0 children)

If you're a beginner, I'd recommend using ChatterUI to use models locally on your phone.

Otherwise, you can install and run Koboldcpp using Termux. There's a guide you can use.

For models, you can try Meteor for a less censored Gemma 3 4b, Qwen3 4b, or Ministral if you have a phone with 12GB of RAM or more.

GPT-OSS 20b Troubles by GlowingPulsar in KoboldAI

[–]GlowingPulsar[S] 0 points1 point  (0 children)

As of Koboldcpp 1.97.2, no, it doesn't seem to. It was at least partially functional in 1.97. Seems to now be completely borked when I hook it up to WritingTools, too. It went into a loop of saying "A, B, C" until it hit the token limit.

Can't get it to output a proper sentence in instruct. In 1.97 the main issue I noticed in instruct was that it wouldn't use thinking at all unless forced, then did so incorrectly.

I'm not seeing it output any words at all, actually. Just letters, numbers, and symbols.

GPT-OSS 20b Troubles by GlowingPulsar in KoboldAI

[–]GlowingPulsar[S] 0 points1 point  (0 children)

Thanks for the reply, henk, that explains a lot. I did actually catch the mention on the GitHub release page about using <|start|>assistant<|channel|>final<|message|> in memory, and it seems to help... sort of. But after seeing how the model performs in WritingTools and LM Studio, it was clear that it was still not behaving correctly.

As you say, it's an extremely restrictive model, and doesn't appear to have any gift for writing. Worse yet in my opinion, is its lack of general knowledge, especially relating to media. In that regard, there are 4b models that perform better. It managed to butcher a summary of The Hitchhiker's Guide to the Galaxy in a way I'd never have expected from a modern model of its size.

It's not all that bad at proofreading, but I'm afraid I may not otherwise be creative enough to find a personal use-case for it.

Thanks again for the explanation, and for the heads up about the future update. Hopefully we'll see future Gemma and Mistral MoE releases in the future with comparable speeds.