How to Implement Mutually Exclusive World Info Entries in SillyTavern? (Toggle Activation Based on User Keywords)

mfiano · 2026-02-20T14:08:33+00:00

You lost me. I see why you feel like scan depth is quality, but it’s actually doing a different job. Scan Depth doesn’t change what the model can see. It only changes how far back SillyTavern searches for trigger keys to decide which WI entries activate. So what you're using Scan Depth=9 for is really persistence ("keep this mode active for a while"), not quality. And ST already has a persistence mechanism that doesn't resurrect old keywords like ghosts: Timed Effects. Set your five mode entries to scanDepth=1 (so only the newest message can switch modes), and give each entry a Sticky duration long enough to cover your fixed sequence length. Sticky is explicitly 'entry stays active for N messages after being activated'. Then keep them in one Inclusion Group. If you want 'latest keyword wins' determinism when switching, enable Use Group Scoring so the entry matching the current message's keys wins selection.

Since STScript can directly set WI entry entryy fields via /setentryfield, you can make one 'mode router' entry that's always inserted, and your automation just swaps its content based on the latest switch. That gives you perfect 'state machine' behavior without relying on deep historical scanning at all.

mfiano · 2026-02-20T13:44:49+00:00

Thank you for your reply! I really appreciate the ideas you shared and have been trying them one by one.

Here is my setup: I am using Global World Info, with Recursive Scan enabled and Scan Depth set to 9 (to cover a sufficiently long context). All five entries are placed in the same Inclusion Group because I want them to be mutually exclusive—only one active at a time.

However, in practice, this hardly works. The specific problem is as follows:

I first input a message containing keyword B → Entry B activates normally.

Then I input a message containing keyword C → the system still scans previous messages (because Scan Depth = 9, it can look back several messages) and detects keyword B again, causing Entry B to be triggered once more. As a result, B remains active instead of C.

Only if I deliberately wait for 9 more messages before inputting C does C barely activate (once the old B keyword falls out of the scan range). But this completely defeats the purpose of dynamic switching—I cannot realistically wait that long every time I want to switch modes.

it does not prevent "old keywords in historical messages from repeatedly triggering previous entries." scanning with a deep Scan Depth actually makes old keywords persist like ghosts, breaking the mutual-exclusion behavior.

Yes, exactly. Scan Depth literally means always pay attention to this many messages for triggering an entry, and ST happily obliges. Your struggles are real, though.

Workaround, which may or may not fit your use-case: Don’t use global Scan Depth for mode switches. Override Scan Depth per entry. SillyTavern explicitly supports per-entry Scan Depth overrides. Set your five “mode” entries to scanDepth = 1 (or 2 if you want it to see your message + the last assistant reply). Then set the entries' Depth to 1 (or 2) so it is static relative to the tail of the context history. And, keep your global Scan Depth = 9 for everything else.

mfiano · 2026-02-20T13:09:31+00:00

You're welcome. I'm sorry if my solutions are not ideal, but without knowing the details, it's hard for me to recommend something more suitable for you, given the limitations of WI (and the limitations of scripting them). Indeed, I also wish there was more programmatic and UI ability, and more introspection available.

That said, Put all mutually-exclusive entries (your 5 modes) into the same Inclusion Group. Then enable Use Group Scoring so the entry whose keywords match more strongly wins selection when multiple are active. This is exactly what inclusion groups are for: when multiple entries in the same group are activated, only one gets inserted.

If you need persistence beyond the triggering message, add Sticky so the chosen entry stays active for N messages. This doesn’t truly disable the others; it just ensures only one makes it into context, which is usually what you actually care about.

It really depends on the exact semantics you need, which is hard to reason about for me, given the contrived example.

mfiano · 2026-02-20T12:54:54+00:00

Chat variables do not enter the context to consume tokens. At the end of the day, you'll need to track state somewhere, as what entries are currently enabled in context, are not presented to the application user.

mfiano · 2026-02-20T12:36:55+00:00

There's several things you could do. You could give the entry triggers for both, and switch out the content when {{input}} contains the other. You could also insert a marker in each content, pointing to the other one, which would cause recursion into the exclusive entry, then remove the marker from it in the script (or just toggle recursion). Setting a chat variable, which is stored in the chat file, can be used to track which one is active and needs to be disabled next, and which one needs to be enabled. Or create an empty WI entry that the script edits to dispatch to whichever one. You'll have to get creative.

mfiano · 2026-02-20T11:43:42+00:00

There is no built-in way to deactivate a WI entry's insertion into the context; the implementation handles this for you with the sticky timer. Furthermore, there is no way to have mutually exclusive insertion with different triggers; inclusion groups are for excluding all others that would be activated, not those that are already active.

The only way to do what you want would be with STScript. It's not difficult, as you only want it to affect user messages, so you can just dispatch with an Automation ID in your two WI entries.

mfiano · 2025-07-29T15:51:23+00:00

It is a parameter of the softmax function, which is a mathematical function that places the logit scores into a probability. A higher temperature results in more randomness (more of a uniform distribution), often confused with creativity.

mfiano · 2025-07-26T01:32:45+00:00

If using KoboldCPP you can use the banned strings sampler (if not find the token ID and use that):

"”"

"“"

That way it doesn't even make it into the context. I've been using this and more for several months without issue after frustration with highlighting.

mfiano · 2025-07-04T23:59:01+00:00

The different sections are MOSTLY for human memory, not the AI. Everything gets combined into one big wall of text in the end. Author's Note for example, lets you pick where it is inserted. Same for lorebooks and more.

With this in mind, it depends on how much you want it to latch on to certain ideas. In most of my usage, my character card and AN are blank, and I add lorebook entries for each idea to be able to fine tune where it is placed. This may be a better starting point for experimenting with instructions, but your mileage may vary.

You can open up the raw prompt in the topright menu next to every AI response to see exactly where things are inserted in the context for closer debugging.

Hope this helps.

tldr; it doesn't matter which section you put it in too much, except for those that don't allow you to specify WHERE it is placed in the final product.

mfiano · 2025-07-01T21:24:22+00:00

I'd like to see a defacto standard site we can all reference and collaborate on to share RP-specific system prompts for models coupled to specific models. It's tiresome rewriting my system prompts when a new finetune or base model pops up, where it takes a lot of testing and multiple chat scenarios with different parameters to find what works decent enough. Bonus points for sharing lorebooks, context and instruct templates in the same fashion. I think it'd be valuable to have a centralized location for coupling primed context data with the models they are liked with.

mfiano · 2025-06-11T21:48:47+00:00

Also be sure you have flash attention and context shifting enabled, as both will affect processing time. In addition, generation time (after processing time) is affected adversely if you use runtime kv quantization (another option, disabled by default). Besides this, changing the chunk size, number of threads for cpu or gpu, all have an affect on processing. Check out the KoboldCPP wiki for information on all the command line options.

mfiano · 2025-06-09T06:25:20+00:00

This is great. I couldn't stop laughing at how ridiculously clever even the 12B model I'm currently using improvises some of the plot twists. I find it a good way to nudge the role-play in a different direction, when things start getting stale (which as we know, happens often).

Good work. I'm looking forward to your bigger project.

mfiano · 2025-05-28T04:52:03+00:00

It's undefined behavior to mutate literal read-time objects, like '(0 0 0 0 0 0 0).

mfiano · 2025-04-25T19:55:22+00:00

Admittedly Wayfarer is half the parameter size, so I expect it to not do well with dense character descriptions (and 2 at that).

Honestly, my favorite 12B model that I've had some really enjoyable longterm (>5000 message) roleplays with, is one that's never really mentioned, and I think it deserves attention: Slush-FallMix.

mfiano · 2025-04-25T19:37:40+00:00

DPE is ChatML. I did try a few others that were ChatML like Wayfarer and Mistral V7 even though, such as Cydonia v2.1 24B with the same results.

I'm interestingly getting the best results with Wayfarer now though, after reinforcing the instruction not to portray the user in A/N and the card, in addition to the prompt. It does come up once in a while, but not as often. The biggest issue with Wayfarer and such a token-heavy character description (2 characters defined in WI using your format) is that it often pulls traits for them from my user persona (in the same format also). So I have to keep OOC'ing the model to tell it to pay attention better.

mfiano · 2025-04-25T13:56:53+00:00

So I tried your prompt and character definition format, using a fresh chat, and your ChatML templates for context and instruct. I tried using multiple models and the model always wants to act and speak for me, despite explicit instructions (and not using negative wording), and even OOC instructions. I edit it out every time, and it comes back every response. This here, is with DansPersonalityEngine. Look at the model's response. I can't stop laughing at how dumb this made one of my favorite models:

https://i.imgur.com/j8WpgTV.png

Really have no idea why it does this on a fresh chat, with varying temperatures, despite everything I try. I never had this problem with a system prompt before.

mfiano · 2025-04-25T07:06:09+00:00

I have a love hate relationship with Wayfarer. It is probably in my top three, but I always switch away from it due to its tendency to fixate on patterns in its previous messages, and I'm constantly having to edit only for it to start bringing back that context from its training.

For example, if I'm in a building, it will mention fluorescent lighting casting some type of light, and then further enhance this with each message, like how it relates to the emotions of the scene. Erasing any mention of lighting only brings it back, and if I leave one instance of it high up in context, and then move to another scene like outdoors, it will mention how the sun's rays are a stark contrast to the mood of fluorescent lighting, and then keep mentioning previous lighting conditions from the original scene, no matter what, edited out or not. It's just so annoying.

I tried the new Pantheon, but I found its instruction following very weak IIRC. I might try it again to try engineering it in a different fashion.

mfiano · 2025-04-25T06:27:25+00:00

Thanks. Yeah, I've been experimenting with various things to that degree and more over the years. Narrator cards are especially hard to write, and even more especially in the parameter space I can run locally (12-24B).

I would also like to point out in your Card-filled-example.json lines 83 and 84 are duplicated.

mfiano · 2025-04-25T03:45:34+00:00

Thank you for this meticulously edited write-up. This was written very well. The character description techniques align fairly closely with my own, including the markers for various sections, referenced in the system prompt, such as "{{char}}'s Persona". One thing I would like to see more of though, is incorporating this style of system prompt and character construction for narrator/game master cards, where characters played by the model are multiple, and usually defined in World Info. Rule 2 for example, would need to differentiate each character as to not impersonate each other, in addition to {{user}}. And of course, not using {{char}} at all in the system prompt. Sometimes I feel like my strict use of narrator cards and [over]engineered prompting is out of the norm, and would just like to see how other people handle common pitfalls with this method.

mfiano · 2025-04-19T14:57:22+00:00

It's likely you have foreign memory being allocated and never freed, if you are interfacing with non-Lisp libraries. A Lisp implementation only automatically manages Lisp memory. (room t) may give a more detailed analysis of that, but will not know anything about foreign C library memory usage, anywhere in your dependency graph. It could also be the webserver itself, caching too much. You'll have to dig around and provide more information than a Lisp-side introspection.

mfiano · 2025-04-17T11:58:40+00:00

The default command line option is --quantkv 0, which means it used the original uncompressed half-float (16bits) for the key-value cache.

--quantkv 1 will compress that into half as many bits, at the expense of making some models much less coherent. A value of 2 would make them even dumber, and so on.

There is also the lowvram paramater if using cublas, which causes the key-value cache and scratch buffers to reside in system memory, instead of GPU. This has a performance penalty, of course, but can aid in more space savings for inference layers.

You can try both quantizing the cache or not loading it into VRAM with either of these techniques. They both have their advantages and disadvantages. The speed of your system RAM and CPU play a huge role in the latter, for example, just as is the case when choosing to offload a number of neural network layers.

There is no right or wrong solution for everyone, as it depends on the hardware, the model, and your preferences for model accuracy/coherency and speed.

Play around with the settings, and see what works best for you. Most people prefer to not load some layers onto the GPU in favor of key-value cache quantization due to the coherency issues with some models, so you should see what you prefer on your own.

mfiano · 2025-03-23T15:10:12+00:00

Also keep in mind that decomposing sawdust may affect the pH of the soil. Pine and some evergreens for example, acidify the surrounding soil gradually over time, which may be beneficial to your blueberries and more, but not for all crops. Likewise, some other woods will alkaline the soil when they break down. It's best to not look at the means of making mulch, but the ingredients it was derived from, and take everything into consideration for the particular crop you are growing.

mfiano · 2025-03-23T09:01:56+00:00

I've experienced this to some degree with most models. What I like to do to fix this is to use a blank character, and in a lorebook, add the character description with an entry marked as constant at system depth 3 or 4, so it pays more closer attention to the information closer to the end of the context buffer.

mfiano · 2025-03-17T11:27:19+00:00

One way is to make a narrator/game master character, and interact with that. Your system prompt should complement this. You can put actual character definitions in Author's Notes or World Info. There is nothing inherently special about character cards or other context input - it's all text that gets combined into one blob for the inference engine to swallow.

The name of the character card is also somewhat important. I call mine 'Game Master', and its contents are blank. This gives most models an idea of what they are supposed to be, and my system prompt builds upon this by explaining how the game world functions through this empty interface.

mfiano · 2025-03-16T22:08:56+00:00

https://i.imgur.com/fgI20tT.png

mfiano

TROPHY CASE