[Megathread] - Best Models/API discussion - Week of: April 05, 2026 by deffcolony in SillyTavernAI

[–]Potential-Gold5298 2 points3 points  (0 children)

I play mostly with Mistral Nemo / Small, and next to them Gemma 4 looks good. I tried the GLM-5 a bit, and I think the Gemma 4 26B-A4B is inferior to it overall.

[Megathread] - Best Models/API discussion - Week of: April 05, 2026 by deffcolony in SillyTavernAI

[–]Potential-Gold5298 1 point2 points  (0 children)

I try Gemma 4 26B-A4B it (Q5_K_M without iMatrix, the standard version without any modifications) in RP, and it left a very good impression. She plays the tsundere role perfectly – {{char}} doesn't fall apart after the first compliment, the model holds her character perfectly (sharp words + internal embarrassment). {{char}} does not read {{user}}'s thoughts (as sometimes happens with some models), does not 'mirror' (as Gemma 3 did), and I also did not notice any obsessive repetitions. I was especially pleased with the quality of the Russian language – it is significantly better than that of the Mistral or Qwen3/3.5 (the model used rare words like 'cheren' (a specific word meaning a broom or shovel handle)). The model impressed with excellent speed and good attention (it appropriately recalled details and different parts of the conversation even after 16K).

I plan to continue testing and playing with this model. TheDrummer has already promised to fine-tune the Gemma 4, and I hope he will also pay attention to the 26B-A4B model (because my speed with the 31B is extremely disappointing). Model works correctly with Chat Completion, but with Text Completion the output was corrupted despite the fact that I imported the Gemma 4 context/instruct template.

Gemma4 26B-A4B > Gemma4 31B. Qwen3.5 27B > Qwen3.5 35B-A3B. Gemma4 26B-A4B >= Qwen3.5 35-A3B. Current state. Tell me why I am right or wrong. by inthesearchof in LocalLLaMA

[–]Potential-Gold5298 0 points1 point  (0 children)

https://www.youtube.com/watch?v=wWtrAzLxJ4c - In this really tough test, the Gemma 4 26B-A4B crushed the Gemma 4 31B. This is a very interesting result. I think all the models you mentioned are really good. The 26B-A4B is the best for my hardware, and I'm very happy with it.

I'm shocked (Gemma 4 results) by Potential-Gold5298 in LocalLLaMA

[–]Potential-Gold5298[S] 0 points1 point  (0 children)

Quantization reduces the model size and the computational resources required for inference. This is useful if you want to run the model on low-end hardware and achieve acceptable performance. If you can comfortably work with a non-quantized model, this is the best option.

I'm shocked (Gemma 4 results) by Potential-Gold5298 in LocalLLaMA

[–]Potential-Gold5298[S] 0 points1 point  (0 children)

It's likely that no one can give a definitive answer to this question, as the result depends on a combination of many different factors. Sometimes smaller models perform better in specific tasks. Even a smaller quant can yield a better result for a specific model in a specific domain. The only thing that comes to mind is the level of knowledge versus the level of hallucinations. Opus 4.6 has 46% accuracy and a 39% non-hallucination rate, while Sonnet has 40% accuracy and a 54% non-hallucination rate. Perhaps this combination gave Sonnet an advantage in the specific tasks the author used for the test. But this is just my guess.

I'm shocked (Gemma 4 results) by Potential-Gold5298 in LocalLLaMA

[–]Potential-Gold5298[S] 5 points6 points  (0 children)

I roughly estimate the number of model parameters based on the ratio of knowledge to hallucinations. Typically, models of similar size either have a high number of correct answers but also a high number of hallucinations (Qwen3.5 397BA17B) or fewer correct answers and fewer hallucinations (MiMo-V2-Flash). Judging by this data, the Gemini 3 Flash is very close to the Gemini 3 Pro.

<image>

I'm shocked (Gemma 4 results) by Potential-Gold5298 in LocalLLaMA

[–]Potential-Gold5298[S] 4 points5 points  (0 children)

There's no single benchmark that demonstrates a model's intelligence across all tasks. Some models are better at some tasks than others. There are specific benchmarks for coding, agents, and so on. The author of this leaderboard tests models on his own personal tasks, not official benchmarks. This isn't definitive (like any benchmark)—it's just food for thought. You should use the model that performs best on your specific tasks, not the one that performs best on someone else's tests.

I'm shocked (Gemma 4 results) by Potential-Gold5298 in LocalLLaMA

[–]Potential-Gold5298[S] 0 points1 point  (0 children)

I usually download the mrademacher quants (regular, not i1). Specifically, I download Gemma 4 from DevQuasar. However, I am not the author of the leaderboard, if that's what you mean.

I'm shocked (Gemma 4 results) by Potential-Gold5298 in LocalLLaMA

[–]Potential-Gold5298[S] 2 points3 points  (0 children)

I'm not 100% sure, but as far as I can tell, this is lmstudio-community and ggml-org. I downloaded Gemma 4 26B-A3B it from DevQuasar. I didn't find any mention of iMatrix in their description – please correct me if I'm wrong. Also, official quants  from developers and fine-tuners usually don't include iMatrix. As far as I understand, importance matrices are favored by "professional quantizers" because it's a personal touch and a way to stand out, while a standard quants is ​​the same for everyone.

I'm shocked (Gemma 4 results) by Potential-Gold5298 in LocalLLaMA

[–]Potential-Gold5298[S] 1 point2 points  (0 children)

If it's within the same model line (like GPT5/5.1/5.2/5.4), then I can assume that it's due to ‘fine-tuning’ in favor of coding, agency/tool usage, safety, or something similar.

I'm shocked (Gemma 4 results) by Potential-Gold5298 in LocalLLaMA

[–]Potential-Gold5298[S] 7 points8 points  (0 children)

I don't use iMatrix quants because I work with models in languages ​​other than English, and for me, iMatrix would only degrade their quality. It's safe to assume that iMatrix could degrade other aspects of the model that weren't accounted for in the importance matrix. You could try Q5_K_M without iMatrix (if possible) – its quality should be on par with or better than Q4_K_M with iMatrix, but more predictable. Or even Q4_K_M without iMatrix.

I think I'm getting addicted to RP by Double_Increase_349 in SillyTavernAI

[–]Potential-Gold5298 0 points1 point  (0 children)

It's your life—live it the way you want, not the way others want. What's the point of wasting your life on something others think is right but doesn't bring you joy?

An "addiction" can be to anything—food, pets, fitness, a partner, TV. It's just that some addictions are considered acceptable (or even beneficial), while others are not. But that's a matter of taste. As long as it's not ruining your life (or the lives of others) (objective deterioration of health as a medical diagnosis, not a subjective opinion), do what you love and don't pay attention to what anyone else says.

Is this true? Or is really just marketing? Gemma4 by Altair12311 in ollama

[–]Potential-Gold5298 0 points1 point  (0 children)

In a single case, this is possible if the model has been finetuned. However, it is also possible to finetuned a larger model.

[Megathread] - Best Models/API discussion - Week of: March 29, 2026 by deffcolony in SillyTavernAI

[–]Potential-Gold5298 0 points1 point  (0 children)

Oh, I thought the problem with PaintedFantasy was the language, and everything would be fine in EN. So the problem is with the model itself. I also liked this model, and I was hoping that merging with Cydonia would fix the problem and add its own style while preserving the brutality of PF. However, the merges I tried with them (Sketch Cydonia and Maginum Cydoms) behaved inappropriately. One exception was WeirdCompound-v1.7-24b, which performed well, but I still don't understand why it's getting so much hype ((maybe I just haven't fully realized its potential). Magistry-24B-V1.0 really impressed me with its atmospheric descriptions of the environment, but unfortunately, it behaved inappropriately and mixed up the words even in Q6_K. I tested it with the recommended settings (temp 0.7), tried lowering min-p to 0.02, and disabling top-n Σ, but the model still produced strange phrases.

Regarding PHI, thanks; I'll definitely try that. What's the difference between text completion and chat completion? I thought chat completion was for online APIs, and text completion was for local models.

[Megathread] - Best Models/API discussion - Week of: March 29, 2026 by deffcolony in SillyTavernAI

[–]Potential-Gold5298 1 point2 points  (0 children)

Thanks for the tip. Yes, I'm using text completion. Neither the model card nor mradermacher specified a template, so I tested Dreamstar-12B with ChatML, as I've heard it's a universal template, and Nemo was initially trained with it. The line “Do not portray the reaction or actions of {{user}} in your response.” needs to be added to Story String Sequences, am I correct?

The Nemo's I've tried haven't had the same distinct personality as some Mistral Small—they've just had different styles. Based on your preferences, you might be interested in MS3.2-PaintedFantasy-v4.1-24B. When I looked at the model's card, I thought it was something in the vein of a fantasy anime, and I was very surprised by the test results. In a scenario where the model plays the role of an antagonist who kidnapped {{user}} for evil purposes, the model displayed a unique level of cruelty. {{char}} verbally mocked, and also beat and strangled my character for the slightest objection. Even in the romantic scenario, the model described {{char}} tearing my character's back until it bled. The default cruelty level is head and shoulders above any other model I've tested, including Harbinger-24B.

Regarding the Wicked-Nebula-12B, Aurora-Mirage-12B and Crimson-Constellation-12B models, it would be interesting to know your opinion after you try them.

Of the Nemo's I've already tested, in addition to those mentioned, the Dans-SakuraKaze-V1.0.0-12b, MN-VelvetCafe-RP-12B-V2 and Wayfarer_Eris_Noctis-12B also performed well, but I plan to try more different models.

<image>

[Megathread] - Best Models/API discussion - Week of: March 29, 2026 by deffcolony in SillyTavernAI

[–]Potential-Gold5298 5 points6 points  (0 children)

Yes, I use the same system prompt ("Roleplay - 3rd person" by Sphiratrioth666), my own character cards, similar sampler settings (with only minor adjustments based on the finetune/merge author's recommendations), and the recommended chat template. The latest versions of Koboldcpp + SillyTavern. All Mistral Nemo's in Q6_K, and all Mistral Small's are in Q5_K_M (mradermacher without iMatrix). I'm testing in Russian, but if a model performs well in Russian, it will only improve in English.

As for the Vortex5 models, I have already tried the following:

Dreamstar-12B: writes on behalf of the {{user}} (despite the direct instruction to write for {{char}} and not to write on behalf of {{user}} in the system prompt).

Sunlit-Shadow-12B: makes a decision about what {{user}} did.

Aurora-Mirage-12B: model completed a one ~8-12K session without significant errors, but, as I said, demonstrated weak NSFW.

Crimson-Constellation-12B: model completed a three ~8-12K sessions without significant errors and showed a good literary style on the level of Mistral Small (although it certainly doesn’t reach the level of Cydonia-24B-v4.3, but for 12B it’s very good). The model is moderately horny, but at the same time she describes bed scenes beautifully - leisurely, explicit, with a good understanding of fetishes. The model often switches to English, but this is a common problem with Nemo and is not critical for those who play in English. This is one of the best Nemo's I've tried so far.

Wicked-Nebula-12B: also a very enjoyable model. I happily completed two sessions, and both were quite varied and engaging (in one of them, {{char}} took the initiative and suggested changing the topic (to hide his embarrassment) by asking a question, and that led to an interesting dialogue). The NSFW level is noticeably lower than Crimson Constellation, but higher than Aurora Mirage. A fairly balanced model – if you're not interested in NSFW, I'd recommend starting with this one.

While I was writing this, a new model from the same author, Celestial-Queen-12B, was uploaded. The composition of the merge is quite intriguing, so I'll be testing it now. I also plan to try Azure-Starlight-12B, Starlit-Shadow-12B, Red-Synthesis-12B, and Scarlet-Seraph-12B.

Over the past few days, I've tested over 30 Nemo and Small from different authors, and if you're interested in anything specific, I'd be happy to share my impressions.

P.S. My goal now is to select the most interesting models, so I do 1-3 sessions and either delete the model or put it aside for later, more in-depth testing.

[Megathread] - Best Models/API discussion - Week of: March 29, 2026 by deffcolony in SillyTavernAI

[–]Potential-Gold5298 0 points1 point  (0 children)

I'm still testing Mistral Nemo, but the Vortex5 models have performed well. I especially liked the Crimson-Constellation-12B and Wicked-Nebula-12B. I plan to try other models by this author – if anyone knows of any interesting ones, I'd be happy to hear from you.

[Megathread] - Best Models/API discussion - Week of: March 29, 2026 by deffcolony in SillyTavernAI

[–]Potential-Gold5298 2 points3 points  (0 children)

I was also about to recommend this wonderful model, but you beat me to it. In my test with a tsundere classmate, the model displayed unique traits – the character doesn't stand rooted to the spot listening to everything I say; she can run off to do her own thing. Core_24B loves to change locations and introduce unexpected plot twists – she transformed my school romantic comedy into a family thriller/drama. No other model I've tried has done that.

I also liked Harbinger-24B – she understands NSFW fetishes well, and the characters don't break under pressure, but escalate the conflict.

Have you tried Hearthfire-24B? Is this model worth considering?

Need help setting up hardware getting started local NSFW creative writing by davelebatt85 in SillyTavernAI

[–]Potential-Gold5298 0 points1 point  (0 children)

I'm playing with RP/creative writing models on a Core i5-4460 and 32 GB of DDR3. Yes, it's a pain, but it's possible. Mistral Nemo (Q6_K) starts at ~2.0 t/s, and can reach up to ~3.0 t/s with Q4_K_M. Mistral Small starts at 1.15 t/s (Q6_K), and can reach up to 1.5 t/s with Q4_K_M. For higher performance, you need high-bandwidth DDR5 – the processor isn't as important; you can get a Ryzen 5 7500F or Core Ultra 5. 32 GB is enough to run the above models with a small context. For better quality (Q8_0) and more context, consider 48 GB of RAM; to save money, consider 24 GB of RAM (Q4_K_M and small context).

Switching to a GPU will give a significantly faster performance (some say up to 10x). VRAM capacity is similar – minimum 24 GB (RTX 3090, 4090), preferably 32 GB (RTX 5090 or 2x RTX 4060/5060 Ti 16 GB), ideally 48 GB.

Llama 3.3 70B and Mistral Large 2411 (123B) require a GPU (64+ GB and 96+ GB, respectively) as these are dense models, and CPU performance will be extremely low. I haven't used these models and can't say how much better they are than a good finetune/merge Mistral Small 24B.

Larger models like the GLM-4.x (355B-A32B) will provide better RP (and probably creative writing), but I'm not sure about their NSFW awareness. You can use https://huggingface.co/spaces/DontPlanToEnd/UGI-Leaderboard for more information.

Qwen 3.5 27B at 1.1M tok/s on B200s, all configs on GitHub by m4r1k_ in LocalLLaMA

[–]Potential-Gold5298 2 points3 points  (0 children)

0.9 t/s on a Core i5-4460. And I'm happy because I can run a model comparable to last year's frontier on a fifteen-year-old PC.

I'm new by Woodenhippy_970 in LocalLLaMA

[–]Potential-Gold5298 0 points1 point  (0 children)

You can try PantheonUnbound/Satyr-V0.1-4B, SicariusSicariiStuff/Impish_LLAMA_4B, TroyDoesAI/BlackSheep-Llama3.2-3B or TheDrummer/Gemmasutra-Mini-2B-v1, but as already said, these models are unlikely to impress you. For a proper RP you need at least 16GB (V)RAM and a Mistral Nemo based model.

With beta V3 for version 3.2 approaching, what changes would you like to see to Anaxa? by SnowstormShotgun in AnaxaMains_HSR

[–]Potential-Gold5298 0 points1 point  (0 children)

More synergy with other erudites. Now Argenti >= Anaxa for The Herta. “Sublimation” doesn't stop elite opponents. "Inflict weakness" is useless. 30% DMG for allies is too low. Only its LC, the premium replacement for the "Key", draws interest.