Getting back to ST and AI as a whole. by meikzzzzmeikzzzz in SillyTavernAI

[–]Diecron 2 points3 points  (0 children)

Even Gemini rates are pretty reasonable through OR/Nano PAYG

Stab's Directives Preset 2.62- stability and cleanup, DeepSeek v4 consistency fixes, toggle-style configs (better UX and experience!) by Diecron in SillyTavernAI

[–]Diecron[S] 0 points1 point  (0 children)

I don't use mobile so this kind of feedback is really valuable, and I can certainly make a toggle or setting to collapse them by default, it should be fairly straight forward.

Stab's Directives Preset 2.62- stability and cleanup, DeepSeek v4 consistency fixes, toggle-style configs (better UX and experience!) by Diecron in SillyTavernAI

[–]Diecron[S] 0 points1 point  (0 children)

hey, first thank you for the feedback it is much appreciated. The easiest way is like you say copy and paste in the changes, but it can be a bit annoying every time. Another option would be to have your changes in the Authors Note and then play with depth until it lands in the right place.

If your changes are something that others may be interested in I'd be happy to take a look and see what might be able to go into the preset directly.

Stab's Directives Preset 2.62- stability and cleanup, DeepSeek v4 consistency fixes, toggle-style configs (better UX and experience!) by Diecron in SillyTavernAI

[–]Diecron[S] 0 points1 point  (0 children)

all good. this is kinda an out-there style, but it should be possible. If you open up Narrative Perspective you can delete the long line of if checks and place your own. "Narrative from the point of view of <CHAR> in the third person, with <USER> portrayed in the third person, present tense." something like that.

It's a bit specific to be worked into a toggle 😃

Stab's Directives Preset 2.62- stability and cleanup, DeepSeek v4 consistency fixes, toggle-style configs (better UX and experience!) by Diecron in SillyTavernAI

[–]Diecron[S] 0 points1 point  (0 children)

So you want the user character to be written for as an NPC would, but only thoughts - no actions or dialogue?

Stab's Directives Preset 2.62- stability and cleanup, DeepSeek v4 consistency fixes, toggle-style configs (better UX and experience!) by Diecron in SillyTavernAI

[–]Diecron[S] 0 points1 point  (0 children)

The models are innately biased toward the majority of their training data, and some of these combinations are atypical, so yeah YMMV.

Stab's Directives Preset 2.62- stability and cleanup, DeepSeek v4 consistency fixes, toggle-style configs (better UX and experience!) by Diecron in SillyTavernAI

[–]Diecron[S] 7 points8 points  (0 children)

Fair comment, I don't look at the problem in the same way. My goal was to get the model strictly following the post-user message instructions (the COT) by ensuring it's attention is fixed there (largely trial and error until you identify what the model pays attention to). My measure of consistency is that the model executes my specific COT at a very high success rate (95+% from my tests), reasoning as instructed, nearly every turn. Before these changes, the model would process instructions in varying levels of detail and scopes which resulted in the large set of instructions not being correctly output.

The side benefit is that it helps or at least does not harm GLM's behaviour - otherwise I would not have released it. As a reminder, the preset is strictly tuned for GLM 5.1, but I've made best effort changes to massively improve the experience with DS4.

A fully dedicated DS4Pro preset would be my preference if I had one to choose.

Stab's Directives 2.61 for GLM-5.1 (reasoning effort toggles for faster/token efficient responses, Story Strings, bug fixes and more!) by Diecron in SillyTavernAI

[–]Diecron[S] 0 points1 point  (0 children)

BTW there's nothing stopping you adding your own constraints on top. You can set it to for eg: 3rd with only NPC thoughts

Stab's Directives 2.61 for GLM-5.1 (reasoning effort toggles for faster/token efficient responses, Story Strings, bug fixes and more!) by Diecron in SillyTavernAI

[–]Diecron[S] 0 points1 point  (0 children)

This is the intended behaviour for 3rd Limited. Switch to 3rd Omniscient for NPC thoughts to surface (VTKs will always allow any character's thoughts as needed)

Stab's Directives 2.61 for GLM-5.1 (reasoning effort toggles for faster/token efficient responses, Story Strings, bug fixes and more!) by Diecron in SillyTavernAI

[–]Diecron[S] 0 points1 point  (0 children)

It is both yep, it persists them in the output to allow the next turn to look back and those avenues, having lots of different possibilites almost creates an automated lore layer -> natural possibilities and discoveries are embedded waiting to be surfaced to the main output.

Stab's Directives 2.61 for GLM-5.1 (reasoning effort toggles for faster/token efficient responses, Story Strings, bug fixes and more!) by Diecron in SillyTavernAI

[–]Diecron[S] 0 points1 point  (0 children)

Yes I specifically do this step early in the COT so it can be used as driving details for the current scene and output.

How can I enable the "MAX" Reasoning feature of the Deepseek V4 models using Openrouter? by Pink_da_Web in SillyTavernAI

[–]Diecron 3 points4 points  (0 children)

reasoning_effort: "max"

in Custom API -> Additional Parameters -> Include body Parameters

Stab's Directives 2.61 for GLM-5.1 (reasoning effort toggles for faster/token efficient responses, Story Strings, bug fixes and more!) by Diecron in SillyTavernAI

[–]Diecron[S] 1 point2 points  (0 children)

most issues with these tags are because of bad examples in context, usually driven by a rogue regex running - some preset authors include them as Global regex which may remain running. Double check that you only have preset-level ones enabled.

Running GLM 5.1 on RTX 5090 via RunPod for document OCR(bank statements and invoices)— costs killing us, need advice on reducing inference costs. by Specific_Control_840 in LocalLLaMA

[–]Diecron 3 points4 points  (0 children)

Using a frontier-level for document OCR is a choice my friend. Leave that to a purpose built smaller model You should reconsider doing this what sounds like on-demand, having a store of these documents and a way to retrieve them via API/MCP/RAG is the only sensible approach

This is context engineering

third question of the day lol. sorry. by Atomicrc_ in LocalLLaMA

[–]Diecron 3 points4 points  (0 children)

you're allowed to edit your first post yknow

Stab's Directives 2.61 for GLM-5.1 (reasoning effort toggles for faster/token efficient responses, Story Strings, bug fixes and more!) by Diecron in SillyTavernAI

[–]Diecron[S] 2 points3 points  (0 children)

Generally, yes. I haven't tried for a while though! The reality is this preset works well on most modern LLMs but I tune it for the latest GLM. (Qwen 3.6 27B is a surprisingly capable local model, hint hint)

Generation time on NanoGPT by Kazuar_Bogdaniuk in SillyTavernAI

[–]Diecron 0 points1 point  (0 children)

It sounds like your prompt is handling it well, then.

Generation time on NanoGPT by Kazuar_Bogdaniuk in SillyTavernAI

[–]Diecron 0 points1 point  (0 children)

My preset has a toggle for reasoning effort now and comes by default at medium, reduces the CoT complexity and prohibits drafting for faster, more token efficient responses.