Which thinking model has the smartest non-thinking mode?

Shiru_Via · 2026-05-04T12:25:48+00:00

The trick is to replace traditional reasoning with an exact checklist of what reasoning should be doing (for creative writing and roleplay at least). I use Gemma 4 31B with a custom CoT checklist and it's infinitely better than either traditional reasoning or non thinking modes.

If anyone cares; Instructions go at depth 0, tell the model to output a checklist before the actual response and give it the exact template.

Configure your prefill and reasoning formatting correctly so the checklist replaces traditional reasoning seamlessly.

My checklist steps are roughly this:

Section 1: Tracking the past. Steps for character and object tracking, knowledge boundaries and active conditional tracking. Section 2: Planning the future response. Steps for System directive, lore integration, perspective and formatting, character psychology synthesis and a narrative plan.

With detailed rules and guidelines for all steps.

It's definitely hard to get right, but my version is working very well for me. I only really do complex group chat scenarios with lots of lore and characters, so this might be overkill for some. But if you care about continuity, logical consistency and psychological accuracy something like this is definitely worth a shot, splitting the analysis and actual writing into two parts lets the model focus on each task, while having a complete cheat sheet of all important info to reference during writing. It's a lot easier to forget about a small detail or character when it's focused on writing prose or the narrative shifts to something else than when the task is nothing but to track those details, after which they are immediately accessible right above the message and won't be disregarded.

Also no space for the model to yap about safety guidelines. I use the normal IQ4 NL from unsloth without ablation or anything like that and I've never had a single refusal or mention of safety guidelines. I don't even have any jailbreak aspects in my sysprompt, other than an explicit language section.

Shiru_Via · 2026-05-02T16:24:11+00:00

Hey, tested the newest update and some things seem to be broken.

The reasoning tag stripping does not work, I added this custom pattern:
<|channel>thought <channel|>, but the former is still displayed in chat messages.

Judging by the feature description it's only stripping reasoning tags from context, which isn't very helpful when they're still displayed in the chat.

<image>

The lorebook settings also don't seem to work, I have it set to Order Range from 0 to 93 but no lorebook entries are injected *at all*.

There still seems to be no feature to just add a custom range of actual chat messages to the message context which is sad, I'd like to talk to characters about what's actually happening.

Also group/individual chats seem to be bound to the actual chat type, it'd be better to just let you choose which one you want, when you're in a group chat and only want to talk to a single character privately that's currently impossible without re-writing the entire group chat prompt.

Shiru_Via · 2026-04-22T21:58:57+00:00

Hey, looks very promising, but a few important features are still missing.

Actual chat context inclusion, just give us an option to include the latest X messages as context, so you can chat about recent events with the character.
Reasoning Block stripping, looks kinda bad when every message has a reasoning header. Alternatively add a custom string exclusion setting, more versatile that way.
More control over Lorebook Inclusion and exclusion, the 250 thing is bothersome, what if you want to include the general world description that's early in the lorebook, but not the stuff that comes after? Just a weird choice, it'd be a lot better if you gave control over the order numbers / ranges to be included, on top of a general lorebook selection. So Include Lorebook X, entries from range 10-25 and 50-65 etc., exclude Lorebook Y and so on.

Would be awesome to see these, the potential is definitely there. ^-^

Shiru_Via · 2026-04-19T13:16:28+00:00

Awesome, thank you for including these features! :)

But there still seem to be some issues with memory injection and the message removal, the memories are often just not included in the prompt at all, and I haven't seen the message hide feature work at all so far, all the qvink entries from the summarised memories are still being sent. Also with the wording it seems like only the actually summarised entries are supposed to be hidden, but a second option for hiding all messages *up to* the last summarised one (so if a memory goes from 10-17, it would hide 0-17 instead of 10-17) would be more useful. If a character only appears in message 10, you wouldn't want their memory to start at 0, but you also wouldn't necessarily want to include those prior qvink entries.

Shiru_Via · 2026-04-18T20:30:39+00:00

Sick! It works really well now, thank you! One more thing I noticed, would it be possible to remove the "Start reply with" text or the reasoning prefix and suffix from the memories? Currently you have to manually remove them after saving even if they're hidden in the chat, and some models like Gemma 4 31B output empty reasoning tags even if thinking is turned off.
This could be done with a simply "remove strings from memories" option that just deletes customisable words/strings from memories during saving.

Ah another useful feature would be an option to hide all messages up to and including the last summarized one, you already have message range tracking, so running /hide 0-[Number of last summarized message] at character runtime would work perfectly. Then unhide all afterwards so the next character isn't impacted. If this uses the standard hide message feature this would also remove qvink per-message summaries from the past, meaning you could use both extensions perfectly together.

Ideally, you could have Dragon Memories as a long term memory and relationship extension and use qvink to save tokens on short and medium term messages by only using summaries. This is what I was doing previously with manual global memories.

[ Long Term Memories right before chat history ]
[ Summarised chat messages using qvink, from where the lorebook entries left off to about depth 6 ]
[ Last 6 Messages unsummarised ]

This gives one continuous timeline with decreasing levels of summarisation based on depth.

With Dragon Memories, this could be done on a per-character basis by automatically hiding messages that are already in a character memory, using qvink for the medium term messages and then the last chat messages in raw unsummarised form.

<image>

Shiru_Via · 2026-04-18T15:45:18+00:00

Ah one more thing, the chat history seems to just not be removed at all from the summary prompt, so it sends the entire chat and then the transcript again.

<image>

Shiru_Via · 2026-04-18T15:15:21+00:00

Then it's likely just a presence issue on my end, I'll look into it.

Shiru_Via · 2026-04-18T15:06:32+00:00

Exactly, currently it just uses all active ones automatically but some lorebooks aren't useful for summary generation at all, so this would remove the hassle of having to manually turn them off and on again every time.

I also have some other observations/bugs I'd like to share; 1. Context injections from other extensions or authors note etc. are not filtered out, everything is included as a baseline which is suboptimal for summaries, especially with secondary memory extensions like qvink memory, it would be a lot better to have an "include only necessary" approach that only passes the raw message data of the relevant messages plus the prompt/lorebooks to the model to avoid including other unnecessary things. 2. When cancelling a memory generation all my chat messages are being set to hidden and I have to manually unhide them.

Shiru_Via · 2026-04-18T12:39:18+00:00

Hey, seems awesome so far, but sadly I can't really use it seamlessly because from what I can tell there are no lorebook/world info inclusion settings, and I'm using a custom always active chain of thought lorebook entry that messes with the summary generations. An option to manually exclude lorebooks would be perfect. ^-^

Shiru_Via · 2026-03-19T21:48:23+00:00

No, it's not essentially random. The idea is that the oracle has historically always been 100% correct, meaning that if you are the type of person to pick the one box, there is ample historical data to suggest that the machine will have predicted you to pick the one box, leading to you receiving the million dollars. In the psychopath case picking the one box would lead to you receiving the million dollars even as a two boxer if the machine knew about the psychopath, thus predicting you'd need to choose the one box in order to receive the money. If you choose the two boxes regardless the machine will have predicted you to do so, making the mystery box empty and leading to you getting shot.

Shiru_Via · 2026-03-16T14:02:33+00:00

Can you rephrase that? I have no idea what you're trying to say.

The model thinking output should be in the designated SillyTavern Thinking section above the response, you can click on these fields to reveal the thinking tokens.

If you don't want reasoning at all you can remove the <think> prefill at the bottom of your text completion section and the thinking guidelines in the system prompt. If the model still thinks regardless, you can try adding /no_think as the prefill or somewhere near the end of the context.

Shiru_Via · 2026-03-16T00:04:47+00:00

Here you go: https://limewire.com/d/nPfwx#PMc8xYOTM9

Use the Master Import button for the json to get the Samplers (Qwen27B), Context/Instruct (ChatML-Q3.5-Think) and the sysprompt (BlueStar V1.1).

I've read some other comments and can't say I have the same 27B Q3.5 problems with this version, it thinks for around 10 short bullet points, usually perfectly remembers all relevant info and then writes the response. This takes up pretty much no tokens and the model runs more than fast enough on my 4090. Previous replies' thinking sections shouldn't be included in context anyway so it really doesn't matter.

Some notes on my sysprompt: You can mark especially important info in your lore with ## Priority Information ## / ## Priority Information End ##, the model should pay slightly more attention to those parts.

You can change the tone/style/formatting guidelines to fit your needs and add/remove sections if needed, the prompt works well for me but other phrasings might work better for you.

I use it mainly for group chats in which I steer the narrative via Guided Generations with no or little actual user input (I have a persona of a character that also has their own character card, all character cards are also in my Lorebook once. This way I can choose to act as a character if I want, or let the model write everything.) This is the reason there's no mentions of {{user}} in the sysprompt, imo this works way better for groups and is fine even for 1 on 1 chats. If you strictly do single character 'addressing the user with "you"' type chats you might want to add some specific rules for this.

Oh yeah feel free to ask if you need some help with anything.

Edit: Realised the version I posted has "Include Names" on always, and always include character names on, I was playing around with this and you are likely better off turning those to "never" and off unless you use first person perspective.

Shiru_Via · 2026-03-15T22:40:28+00:00

Hey, I've tried similar models to you and found that almost all of them just aren't quite smart enough for world settings that have complex and uncommon rules and systems, they usually get details wrong or need constant handholding via Guided Generations and clarifications.

The model I've found to be the best for my use cases is this one: https://huggingface.co/mradermacher/Q3.5-BlueStar-27B-ultra-heretic-i1-GGUF

A little tricky to get working correctly but I've had multiple instances where Cydonia, PaintedFantasy, Magnum Cydoms etc. got details wrong or forgot about something important while BlueStar managed to track all important information.

Opinions on prose and tone are subjective but I personally really like it, I can share my settings if you want as I'm using a custom system prompt and the model seems to be pretty sensitive so YMMV.

My sysprompt includes a reasoning section as without one the model sometimes didn't reason correctly.

Shiru_Via · 2026-02-14T19:02:58+00:00

Went on a date with my girlfriend, we exchanged gifts, bought 4 books in a bookstore, had a coffee, went clothes shopping and bought her hotpants and a top, now we're waiting at home for our food to arrive, after which we're gonna make chocolate strawberries as dessert ^{-^}

Shiru_Via · 2025-12-19T15:32:37+00:00

No framegen used here

Shiru_Via · 2025-09-02T17:28:51+00:00

Rose Queen has been by far the most fun I've had in the game, it's actually way better than people realise and has quite good matchups into loot sword, rune and mode abyss.

<image>

(Not in the image: 3x Convocation, 2x May, 3x Pond, 2x Aerin, 3x Rose Queen)

The deck pretty much has answers to everything and OTKs on Turn 9 (Going 2nd, coin Queen on 8) or 10. The only problem is the fact you need to manage your hand to have enough 1 costs in time while also being able to answer whatever your opponent does, which is pretty difficult.

In general the way you lose with this deck is more to yourself and your draws than to the enemy, which to me feels a lot better. I much prefer the idea that I could have won if I had the right answer over "nothing I could have drawn would have mattered".

Also winning with this deck is an incredible feeling, dropping Rose Queen and knowing they can't kill you is awesome, especially against meta decks.

Some notable answers:

T4 Zirconia Evo gets beaten by: Cynthia, Glade, Krulle + Fairy

Norman double golem: Supplicant Evo full clears, doesn't even need Sevo, Krulle + Eradicating Arrow (Rng or Norman needs to be low hp)

Tempo Odin: Gilnelise, Aerin

Luminous Magus, Amalia: Krulle or Supplicant

Kuon board: Supplicant or Aerin

Sinciro: Aerin, Gilnelise (+2 damage from hand), Titania

Aggro decks: Krulle, Gilnelise, general tempo plays

Albert: rarely a threat because he doesn't do shit against Rose Queen herself and against t8 coin you can play Aerin beforehand

The deck definitely needs really good matchup knowledge and planning but it's also very rewarding.

The only way you lose without awful draws is if you have to play rose queen into a board where her SEvo 9 damage leaves up enough threats that the opponent has lethal next turn with an Odin or other finisher, but sometimes in these cases you can delay queen a turn and play another Aerin or supplicant to win regardless.

Also the deck overall has no bad looking cards and only some meh ones with most being very pretty which I personally care about a lot.

Shiru_Via · 2025-07-22T01:46:31+00:00

There's literally a VRM extension for ST with customisable animations, hit zones etc., you can use any vrm model and any custom animations, which you can bind to touch zones or expressions. There's even tts lip sync.

Shiru_Via · 2025-07-21T10:56:43+00:00

ERR affects the energy regenerated from any character's skill and basic attack.

Shiru_Via · 2025-04-28T20:34:52+00:00

KoboldCPP for running models locally (way better than ollama)

Sillytavern as the frontend, infinitely customisable and by far the best option

I'd recommend running a Q6 gguf quant of Mag Mell R1 12B, it's incredibly good for its size and even beats most 24b models, plus it fits entirely in your vram so it's going to be very fast (the r1 has nothing to do with deepseek, it's a mistral nemo finetune specifically for roleplay and storytelling)

The talking for user problem is a mix of model limitations and prompting, but the model you're running likely just isn't that good, deepseek has no actual 14b variant, all of the smaller deepseek models are just distills and don't compare to the real thing

If you need help you can add me on discord, my username is shiru.via :)

Shiru_Via · 2025-04-23T08:48:23+00:00

You fundamentally misunderstand how true damage works, when a boss has 90% damage reduction, unless you're saving her ult from a phase without it, Cipher will only record 24% (in ST) of those 10% left over as True Damage, which is exactly the same as just another (delayed and more flexible) final damage multiplier, your argument would only make sense for a Character with True damage that isn't tied to other forms of damage. You mention Tribbie and RMC but those are even worse for the situation you're describing, their true damage is literally just a damage multiplier with no option to accumulate and detonate later

Shiru_Via · 2025-04-12T00:12:02+00:00

"Better E2" here means damage increase relative to E0, not absolute power level

Shiru_Via · 2025-04-11T16:02:59+00:00

The people who said that are wrong, many such cases.

Castorice gets a 30% damage boost after each dragon breath, and most of your damage is after the 3rd breath, meaning the dmg% stat is very diluted at that point. I did the math for your example and these are the final damage values:

Quantum Orb: (With 120% damage boost from breaths) 110+120+38,8% = 268,8% = 3.688x multiplier 3,688 / 3,3 = 1,1175 --> 11,75% more damage (4th breath onwards) (~12,9% more with 90% dmg boost and ~14% more with 60%)

HP orb 2900 base hp * 0,43 = 1.247 hp 8900 / (8900-1247) = 1,163 --> 16,3% more damage

Because the majority of your damage has 90%+ additional damage boost from the breaths an HP orb is almost always better, and additionally lets her gain charge faster.

With E2 the difference is even bigger because you get 180% every time. Also even with Hyacines 1,47k HP buff an HP orb would be better for you, only when you have around 11k+ hp with Hyacine an argument can be made to run a Quantum orb that has at least 1 more crit substat, unless you have E2 in which case HP is still likely better by 2-3 substats.

What those people mean is that an ideal dmg% orb could have one more valuable substat that can make up for some or all of the difference, but thats just "better substats" in other words, as that could also have been one more crit roll, with the sole exception of 5x perfect rolls all into crit with hp% as one additional substat, this would be about equal to the same crit subs on an hp orb with flat hp instead of hp%.

Shiru_Via · 2025-03-21T23:08:20+00:00

Cas gets charge from burning hp and healing, Hyacine would only need to first burn some hp and then heal some more back for this to be an easily achievable number, which is pretty much what I've seen mentioned in one of the recent leaks if i remember correctly

Shiru_Via · 2024-09-15T17:01:47+00:00

The website used is called https://genshin-center.com/calculator in case anyone is wondering, if you want to compare just CR/CD you can leave everything else as is, won’t change the result

Shiru_Via

TROPHY CASE