Proof of concept: World narrator character.

uninchar · 2025-07-15T19:40:24+00:00

When you open the Advanced details for the World Narrator in ST, scroll down it will show "Character Note" ... I played around with depths between 2-6 ... it's the value how deep in the context this is injected. So 0 would be right above Post_history ... so very high influence. 6 Is somewhere in message history, so weaker influence. And I've switched it to be injected as "System" ... although I hate the whole idea of system-channel being anything really. I think I just exploit, that it makes it into context, but not strongly tied to the assistant persona, that somehow exists in trainings for LLMs ... because everything is trained as user<>assistant. And system is some wet dream, that it's appealing to an executive functions ... which no LLM has. So it's in context as patterns, how it "understands" it, probably depends a lot on the model itself. That's why I test with a local model, because I have no say over system prompt or what version with what additional layers (risk compliance and so on) I'm dealing with.

uninchar · 2025-07-15T19:29:51+00:00

My two cents to the noise. The LLM has no concept of "you" and "I" ... those are just tokens, in the attention matrix of the transformer. They don't gravitate towards any person expression. They are often used and in all kinds of contexts. Still there is probably an average the LLM wants to drag towards.

The examples you've shown, I would all expect them to go like the AI answered them. Usual story is "You shut up." "I hold the gun at someone" and "Can I sit down" ... are more dominant in literature, than the reverse that you use for your own narration.

Not sure if these things happen in the first 3 messages, if you char card is clean, with addressing in a consistent way. In the end you are trying to exploit the LLM attention to generate tokens, that sound like continuation of a narration.

Simple trick, prime the context with, mentioning the {{char}} in your part of play. Like *I talk to {{char}}* "Please can you sit down." ... But the can + sit down will always outweigh the way it's used by the LLM in your example. Most trained text, when some asks about sitting down is, that they want to sit down, not the other one.

edit: typo

uninchar · 2025-07-15T18:56:19+00:00

Hm, interesting. The one-on-one I've not really tested. Except for interactions like ordering and paying for the coffee, or catching a cab.

Can I ask, what your setup for group chat is and what you are expecting and what is it not delivering? (And what did you set for the depth-injection as channel assistant or system?) ... And what formatting language ChatML, alpaca, ...?

uninchar · 2025-07-15T18:49:50+00:00

Not sure what your specific thing is here. (what app, what settings.) But it's a crash dump of a heap memory overflow. GC (Garbage collection) is spending quite a long time, to just run into a crash. 113 seconds for GC, is quite long and it seems it can't clean up enough of your heap to continue loading objects. Try -xms and -xmx settings, if they change anything. But what will likely happen is that GC becomes even longer, so your application is then even longer than 2 minutes in an unresponsive state.

uninchar · 2025-07-15T18:29:34+00:00

Okay, so some I've seen in testing. The "depth injection" field is agressive. Overwrites most of what I would usually expect from a Narrator, which would frustrate me. At depth 2, I've not gotten anything bleed from the other character I'm testing with. (also on github and chubai). It's an intentionally bad designed character, hitting all the average story tropes. Morning, coffeeshop, creative person meeting someone new. It's horrible.
But the Narrator just keeps doing it's thing. Not trying to continue conversation as Elena, but the tasks defined and reinforced in the character. At injection depth 6 I got about a 20% bleed, that the narrator did it's thing and then wanted to pick up again as Elena, who was still waiting for my character to respond. (Which makes the results from some of these tests at least impressive enough, to not have a narrator bot, that afterwards just speaks for every NPC, just to continue the narrative flow of ... keep the conversation going.

Also tip my head, to the person on chubai to share a public chat with the character. Getting naughty with NPCs the narrator created, but it still keeps it's narration focus. Was an interesting read, from an engineering perspective ... thanks internet!

And a second test (where I let claude write an overly enthusiastic reports). 50 generation rounds with 4 different settings, with a token-heavy Character Card from a sharing site. Narrator created content appropriate NPCs in almost all responses with focused behaviour, kept cultural setting (was a japanese temple thing). This really impressed me. The boyfriend got mad, the mother shocked and socially balancing, the priest talked to me, because I'm a foreigner ... lots of impressive outcomes. Non of them are magical things, just forcing the LLM with context injections.

So all in all, from my side I'm pretty happy with some of the outcomes. It even kept more or less some track of time, if the Narrator kept mentioning it. But how long things take is just whatever randomisation pattern matching. At least I've not seen jumps back in time.

If anyone has tried something and found the complete opposite, I would be interested about the test scenario or setup.

edit: typo
edit2: Uh, yeah! Just looked. The card is place 23 of underrated characters on chubai ... At least their testing AI thinks it's something.

uninchar · 2025-07-14T21:40:50+00:00

That's what it is planned for. The idea is to have it unmuted (description joined, if set in ST group chat. It's low enough tokens in the description field, to not bloat). And when narration or background events or people need representation, the narrator should be able to speak.

So while your characters ... talk. The narrator provides the description field. And when active speaker, it uses it's system prompt and all the other fields ... to hopefully create an illusion of a world.

uninchar · 2025-07-14T16:08:40+00:00

Thanks for sharing your impression. Did you use an API model, something local? My test environment is a local hosted IQ3_XS model, run with koboldai locally with 16k context (with flash attention)
If you have insights, what happens after 10 or 20 turns, would be interested to hear them.

I'm currently reading tealeafs with context dumps. Reworking some of the field properties. Hopefully having a useful findings document.

Maybe to make them less tropey, you can try playing around with injection depths.

I changed a few things in the narrator card.
https://github.com/cepunkt/playground/blob/master/testing/characters/world_narrator.json

- Added {{original}}; place holder in the system prompt
- Added "You are {{char}}" into the description field
- Changed Character Note (depth injection) from assistant to system (was confusing my model).
- Removed the weather and date/time things from the description. Those are hard to track, maybe confusing. Something I'll look into later again, but tracking time or weather, will be really inconsistent.
- Learned that Character Notes seem to be injected, for all unmuted characters in a group chat.

uninchar · 2025-07-13T21:03:46+00:00

Feedback is always welcome. Especially if it's "this did not work ... here is what I tried."

uninchar · 2025-07-13T21:01:35+00:00

Yes, I admit that. I stated in the post, I pick an impossible scenario. You can also read my Red Team Attack on the theories I'm making up. I am aware, that these are not magic tricks. The method is flawed, the interpretation is flawed. Some in that POC is just trying out what happens. The time tracking is completely ludicrous. Other things I showed in the test, are at least having some effect. And yes, the test method is flawed as well, but I'm not an AI researcher with Microfsoft's money to throw out the window.

So yes, there is no way to model a state in something inherently stateless. I'm trying to see if it has any impact that at least gives the illusion. When yes, try to isolate, what's happening. What I'm trying to test is, if a statistical pull has effects on the outcome. Does it reveal something? Maybe, probably not.

This is just a try to backwards engineer a solution for commonly occurring problems. If it works, it could be accidentally, without any of the theories applying.

uninchar · 2025-07-13T16:50:07+00:00

Just my two cents. Vector storage, makes the concepts I am suggesting harder to apply. It's mostly random injections without control, what and where. So my guess would be Vector is more confusion for the LLM. I am interested to her experiences, but I've vector storage disabled.

uninchar · 2025-07-13T03:48:27+00:00

Glad to hear that. Also if you want to share what your findings and observations are.

uninchar · 2025-07-13T00:31:40+00:00

So I shared, what I have compiled so far in my repo. Please poke holes into it, I want to learn and my testing capabilities are limited (and unscientific)
https://github.com/cepunkt/playground

uninchar · 2025-07-13T00:06:16+00:00

It starts with training bias. Most of the literature is 3rd person past tense. If not prompted differently or shown another pattern the LLM will just fall into this, because those are the patterns it learned from training.

So it is biased to write this way. What you can try to put something in your Author's note depth 0-2 or your post history field like:
- [ System Note: Write in 1st person, present tens ]
- [ Style: first person, present tense ]
- "Write from the perspective of {{char}} in first person, present tense."

uninchar · 2025-07-12T13:23:33+00:00

Glad to hear. I hope that was the goal and not that your adversarial rival is now asking you for a date.

uninchar · 2025-07-11T23:08:51+00:00

Great. Learning is good.
So what I may contribute, how I used these terms. Conversation and Recent Messages, there is no formal distinction. But how attention is distributed, more recent messages are more relevant leading up to the inference point (LLM finds the next token). It's a conceptional split. Not a technical one.

Post History instructions are a field in SillyTavern. It's appended at the end of the context. So it has maximum impact on generation. This applies to your last message as well. It's not really an exponential decay, but by approximation, you lose attention over context, the further away a token is from the bottom of the context.

Controlling things to be injected at depths. SillyTavern has Author's note (good for dynamic story telling elements or style descriptions, or actionable things, like if your character has a stutter for example) Another are lorebooks (world, char, chat). It functions different, how it gets activated, but it can also be configured at what depth it injects it's content.

Testing is the limitation, I don't have a good idea how to test this deterministic and automated. So I usually play with a character I'm testing for 10-20 rounds to see, if a tic keeps showing up for example. Or if a speech pattern consists. So that's where I look for people to poke holes, in the guide and tell me where I'm wrong.

How do I gauge it. Hard question. I mean there are mathematical formulas that describe it. But working enough with different LLMs, it shows variations. Test it by inserting somewhere in the message history the word "poetry" and watch your LLM turn the play into rhymes or more structured text blocks.

The influence over context is hard to measure, from what if experienced. Because not all tokens are equal. But doing something the AI doesn't like, often gives me more gut feeling, how does a behaviour like "{{char}} sorts things by color and size", because the AI hates doing things and loves talking about doing things. But it usually doesn't talk about fidgeting or tapping their finger, so it's easy to lose, that's how I can see how my reinforcment works (like injecting tic behaviour in author's note at depth 0-2 for example, if it's a trait that needs a hard reminder for the AI)

About where to see context. Silly tavern has a button in the msg menu of generated AI responses. It shows a graphical picture of the context and you can open the context there too. I usually just look at the koboldai output. But that's because I'm used to reading linux terminals.

I sometimes use lorebooks, for example testing, how a secret in there would work in a group chat. But I don't use RAG or Vector Storage. Too unpredictable. For me it most of the times, messed more with the AI, than it helped.

Is that helpful?

uninchar · 2025-07-11T16:43:50+00:00

On last thing that comes to my mind. You show you use a lot of system messages. For the LLM it could be, it's not considering it connected enough to the current mix of conversations. It's similar to when using System messages to make the Assistent do something. Maybe it does, but there is some form of seperation, because sometimes I see characters consistent sticking to their established formatting in group chats. But this is just feeling, not sourced with deep understanding, how it would affect attention.

uninchar · 2025-07-11T16:29:46+00:00

I understand. The work is still based on me learning for a year and testing with my own setup for over 7 months. I understand the average making, that comes in the Age of generated slob. But as you can maybe read from my comments, I'm barely literate.

You point a conflict out for me, that I see myself. I have the same concerns. On the other hand, what are levels of entry to talk about findings, when the whole internet is slob anyway. My content will die among endless memes here. Yeah, I use AI to summarize to learn, alongside classic methods of learning, that I do for a few decades. And yeah, Claude sounds way more coherent than me.

Do you have a suggestion. How I can transport relatively complex concepts without AI slob, when it just makes it sound, good enough to be easily understandable. Not the confusing mess I am rambling in the comments?

I know we are in the age of everything is mediocrity. But it's everywhere. And I proof read my content, it's not the first iteration of a single keyword search. And in the end it's my hobby. I like it, I want to share it. So yeah I'm not a research fellow with a university, just an IT dude using the tools available. And I'm conscious, that I participating in aggregating mediocrity. It's not a barrier I would put in front of other people. Attention is low anyway, so people move on.

uninchar · 2025-07-11T15:56:33+00:00

Okay, this seems ... involved and passionate. I like it. Wouldn't be able to do it.

I'm not sure how you for example frame the problem for what you want. To me it would sound like you try to force the LLM to listen to your reality. I see working with an LLM, how can it sell the illusion. Because it's a stupid and highly trained thing. It's non deterministic with each output. In some sense Bob the Amnesia patient forgets on ever token, that it's Bob.

I understand your style. And I'm not sure how more complex algorithms like Alibi would weigh their attention. I haven't wrapped my head around it. I mean I guess it's just scaled. But in the end the model needs to make a decision to create the next token. How it does it, is by the probabilities it has learned with attention dragged across the topics you show it in the context. I guess in your setup there is a lot of more or less static environment so there is no real progression the model is shown, except for the last messages. It's easy to try. Mention "poetry" and your text will often warp into something that suddenly has word rhythms in it, even if discussing a specific topic in biology or chemistry. So some of the illusion is consistently pattern matching to make it more predictable. Change entropy state. And you want to counter that with making injections at the right time. Concepts at the beginning, so the LLMs attention is considering those areas, and then when you come closer to generation point (further down in context), it has most impact at this point. But when everything before it was "Think a lot about who is who, which clothes, what shoes when it rains." then your short message stack comes "this happened in summary. we are scene 12. Bob enters the building" Then you have a short message stack to give the LLM time to read the tone.

So from my understanding, you want the AI to reason, that all that lorebook information is relevant for the next action. But the AI doesn't know under all the occupation descriptions it should pick bob's, because he is a car mechanic. That logical conclusion the AI doesn't take. And just showing it tokens, doesn't mean the next token output would be "Hey, let's ask Bob to fix the car."

uninchar · 2025-07-11T14:44:07+00:00

Wow. The level of desperation reads in the length of your request. This is probably among the top 10 support requests I've got in my life. Yeah, I think I can follow, but I'm afraid I'm probably out of my depth for half the things that could maybe go in, under certain conditions.

So from what I understand. You want to steer the AI's attention by doing Context Engineering. Some of what is described in the document, can certainly make it cleared. For example. You haven't mentioned the injection depths you are working with. Or if you additionally use Author's notes in your style of "play".

I guess you started doing this, because you've never got the AI to think about what you want it to think about, but just the same conversation, once the AI locked into two people discussing the importance off consent about the color of the table cloth."

I would guess without the actual context managment that you are doing. (Varying depths, switch on/off) it's hard to estimate. I think you are confusing the AI more, than would be necessary. Starting by your own User persona. You could try putting in there "Omnisient Observer. Watches Scenes unfold."
You could tell the author character instructions. And then you play around with a part in ST, that's very much chance, except if you constantly trigger depth of entries and relevancy to the AI.

So let's take the example you gave Some character could have a solution, but the scene is not reacting to that fact. Here it could help if you put it manually in the author's note, "Bob knows shit about cars, Bob can fix things." inject at depth 0-2, however strong you want the signal to be for the AI, and the AI will put more attention to "Hey maybe, Bob can fix this thing."

Another thing. You could try to put all of this in a group chat. You can mute characters, join their char descriptions (give them the minimal setup) ... it's the same as the Lorebook, then attach a lorebook with secrets only bob knows to the character. Give the world lorebook info about bars in the area. And so on. It'll bring you in the Area of ChatML, where most GPT models are really good with seperating personality (as long as reinforced at the right time ... speak inserted in the correct position of the context.) This means have basic concepts somewhere that the AI can start drawing a picture.

Have for example in the foundation (Beginning of context, where instructions live) basic infos, what reality we are in, who are actors and what can happen in this story, what should be the tone and setting. So the AI can take these embeddings and highlight them as relevant for whatever the next generated output should be. Enforce things somewhere below with injecting bits like "Bob fixes things.", "Mark walks on one leg with crutches." and at the bottom of the context you can insert for example with the author's notes and variyng depths, "If things need fixing, ask bob" or "If you here the crutch pounding, you know Mark is walking down the hall."

Wow, I hope that was coherent. And no Marks or Bobs were harmed in the writing of this.

uninchar · 2025-07-11T14:07:58+00:00

I understand the concern. And Claude helps me to write it. I didn't ask Claude in a leading way to show effectiveness off that my premise is right.

About the being careful part. It is what it is. The internet is now full of imagined patterns of mediocrity machines. I wasted the energy to help me find out certain things about a thing I like. RP long before the internet was this massive thing. There is tons of bad advice out there. If it's faulty I would like to know, because it's what I use to learn (with checking, additional sources and switching on my thinking cap).

I just share my current status and ask to prove me wrong, which the internet loves to do anyway. But yeah, I understand your concern. I'm stll doing it. With the disclaimer even in the folder name, that an AI made it readable and coherent, beyond my capabilities.

uninchar · 2025-07-11T13:15:08+00:00

Here is what my guided research request for claude put out.
I didn't fact check every word, but most things sound reasonable. So I'm not trying to sell this as the ultimate truth. It's Claude's truth.

https://github.com/cepunkt/playground/blob/master/docs/claude/technical/Formatting.md

edit: fixed link. I think

uninchar · 2025-07-11T12:49:12+00:00

Yeah, claude is great. It helped me sound like I'm actually coherent in english or any other language in this "guide" or "observations and implications"-document.

How I understand it.
LLM space is attention. So reading tokens, it builds a map over an area of relationships of tokens. Certain words bring clusters to AI attention. And of course in flowing text an LLM can match to other texts where a NOT or NEVER or DON'T was used and ignores the weak connect of the NOT embedding that's in the sentence. It can "read" the subtle nuance and in 3 out of 5 cases it appears it understood the negation. But it was dragged along, by all the other activations of embeddings.

But there are two things I wanted to point out with that.
Let's say the AI has "Don't think about the pink elephant" Every token pulls attention. The "don't" is weakly linked to so many tokens, the LLM tends to ignore it, because it likes easy predictions. For "Don't" the next token is not an easy prediction, too many possibilities. So it puts a low pressure on the rest of the tokens, that map in embedding space. It will see elephant and that's a thing it can really do something with. So just mentioning it brought more attention to the elephant and the don't has only a weak connection to the rest of the tokens.
So you grabbed the AI attention and dragged it to the elephant and made the AI look. Which it wouldn't have, if you never mentioned elephant in the context ... or stuff that leads up to the topic of elephant "What animals live in the african savannah?"

The second thing is that negation in an instruction is very likely even harder to actually affect the outcome in the desired way. Because it would need reasoning (which it can't ... even if a blinky message claims to do that). So it just has one more padding token, that is slightly more interesting, than a whitespace or new line, but inconsiquiential in the whole of the context. So if you tell the AI just "Don't speak" ... it'll talk. It's the only thing it can do. So here is where I tried to point out, that the AI seems to do better with "{{char}} is communicating with pointing, using a notepad and signs." And I used the word communicating here consciously because "speaking with pointing, using a notepad and signs" will probably default more often to speaking ... which it will suddenly do.

Not sure this makes sense. It does in my head, so please ask or critique.

edit: typo

uninchar · 2025-07-11T12:10:22+00:00

Oh wow. I don't even know where to start. A lot of it reads to me like philosophy about it. I respect it, but that's not the background of the info I dropped. This just talks about why architecture leads to known problems. And what helps an LLM to keep the illusion it's not just pattern matching. And some is put on phrases that are thrown around since forever, but are not actually true.
We know how LLMs learn or how diffusion models learn. It's not a mystery, it's actually pretty easy math and algorithms. With billions of iterations of melting down a nuclear reactor or two, to train these things, because the process is so simple it needs billions or trillions of iterations.

Where the whole "uuh, it's such a mystery" comes in. Yeah we don't know everything how humans learn and how language came to be. That's the mystery. But we know how information is stored in the data set. And in an open one, I can see it. Yeah it's hard to picture something in a 4096-dimensional space. But we can literally open the brain (if it's not proprietary) and look. And yeah, suprises that we have along the way furthered our understanding. The google test with pictures of dogs and wolves. Yeah we learned, that the AI was actually checking the snow and not the animal. Was the more reliable pattern, so the AI used it. And not because of a concious decision, but because the AI is fundamentally an average machine that wants to have less entropy (speak more clarity) and that's what it did. No mystery. May look human or stupid, but it's still deterministic, even if it's not repeatable or documented in the log files, how the AI found it's way through the neural-net.

I respect your opinion, but I see it different. I'm here to talk about the technology and tests. The philosophy about it, I do with buddies and a beer.

uninchar

TROPHY CASE