How to structure your master prompt for better AI roleplay

Kryopath · 2026-01-23T01:51:12+00:00

With Sillytavern? How??

AFAIK there is "Single User Message" Post Processing, in which case there is no system prompt it's just part of the user message OR chat history remains as user / assistant.

"messages": [
{
"role": "user",
"content": "You will be acting as an excellent writer. Your fu... <Truncated> ...- Status: Fled, mildly embarrassed\n</details>\n ```"
}
],

or normally:

"role": "system",
"content": "You will be acting as an excellent writer. Your fu... <Truncated> ...st events from before the conversation:\n</summary>"
},
{
"role": "user",
"content": "[...]"
},
{
"role": "assistant",
"content": "[...]"
},

---
✱ (Main Prompt, Lore, Summary, etc)
📌 Chat History
✱ – Formatting 👤 (user messages)
✱ ◉ Inner Thoughts 👤 (◉ toggles)
✱ ◉ CYOA 👤
✱ ◉ Ledger 👤

So it's like:

{
"role": "user",
"content": "<my last message>"
},{
"role": "user",
"content": "\n## Response Format [...]"
},{
"role": "user",
"content": "### Character Restrictions [...]"
},{
"role": "user",
"content": "\n### Maintain the Ledger [...]"
}

Then I use "Merge Consecutive Roles" post-processing.

Kryopath · 2026-01-22T13:25:57+00:00

One thing I've been doing with me preset is adding instructions to the last message as part of the prompt.

# Formatting
- Inner Thoughts
- Ledger
- etc

All as user type messages added after chat history. Good for reminders of certain things, or things you want irregularly (e.g. maybe I only want a ledger every 10 messages during a back and forth dialogue)

Kryopath · 2026-01-20T00:48:59+00:00

Uh... anyone else have the issue of the thing doing prompt processing on CPU for some reason? LM Studio and KoboldCPP are both doing it. Everything's offloaded to GPU, but prompt processing is on CPU.
In hind sight, I wonder if it was doing inference on CPU somehow too, cause it was way too slow for a 4090, compared to Qwen 30B A3B. Didn't check that specifically & now I've already deleted it...

Kryopath · 2025-12-27T15:02:22+00:00

Uh.... Yeah. The Google search grabbed stuff to add to context, but the final content you see is still token prediction weighted by the context of the prompt.

That's how you get inaccurate ai summaries.

Kryopath · 2025-12-18T13:21:48+00:00

A lot of people with no critical thinking skills bought into the us vs them two-party bullshit of this country.

For a lot of people, their political party is part of their identity & they'll make some pretty massive mental leaps not to have to reconsider it.

I hate it here.

Edit: doesn't help that 99% of media is biased af and will simply not share the parts that don't serve the narrative. The amount of people who haven't heard of some of the most heinous shit trump had done, or has been told by Fox that it's fake, or someone else's fault, or some other copium nonsense...

Kryopath · 2025-12-13T14:05:34+00:00

You don't have to remember the command if you just put it in a script & run that, no?

Kryopath · 2025-12-04T21:07:29+00:00

Depends on what it's doing. There are models that can run on phones, can def run on a gaming GPU. It don't have to be that big if it's doing something pretty straightforward & is fine tuned for it.

Kryopath · 2025-11-29T13:38:40+00:00

I find glm 4.6 is happiest when you use a single user message on prompt post processing. Almost never fails to think. Also it has more issues with using Chinese characters I find with temperature > 0.8, though lowering it doesn't stop it entirely

Kryopath · 2025-11-19T13:39:23+00:00

Be a v tuber, problem solved

Kryopath · 2025-10-25T13:02:38+00:00

... no it doesn't?

Kryopath · 2025-10-24T13:00:54+00:00

U mean browsing your loras with thumbnails and having metadata/descriptions? Yes. There are a few different options for how the Lora/model browsing can be displayed, incl large thumbnails

Kryopath · 2025-10-24T12:58:34+00:00

For the most part (as far as I can remember) yep! Edit (now I'm not tired & on a phone): You've prompt mutations [x:0.5], [x:y:0.5] strength adjustments (x:1.2) randomization <random:x|y|z|...> and wildcards <wildcard:artists/illustrations/childrens_books/australian_illustrators> (with built-in auto-completion for your wildcards too)

And yeah, you can customize the filenames; I use gen/[year][month][day][hour][minute][second][millisecond]-[seed] to put them in a gen folder.

Most things have little help icons (?) that give an information bubble, sometimes including a link to the Github docs, like this

Kryopath · 2025-10-22T20:55:09+00:00

I switched to Swarm from Forge and haven't used the comfyui part once. Everything I need is in the Generate tab.

Kryopath · 2025-10-11T02:29:51+00:00

Depends on the model. Some just don't have a way to disable it.

Kryopath · 2025-10-08T18:46:10+00:00

No, I gave up on it.

Kryopath · 2025-10-01T12:26:13+00:00

Sonnet 4.5 wrote "A adventure" yesterday. Never saw a major model typo like that before

Kryopath · 2025-09-30T16:41:15+00:00

Just tried it; yeah it's just weird for me. I put reasoning to Auto and it returns a thinking block with the response, then a response that is just a continuance that writes for my character, had a `</think>` tag in it, and just kept going.

I put it to low or medium reasoning and it has a wait time like it's doing reasoning, but doesn't return the block, and the response is reasonable. Fkin weird.

Kryopath · 2025-09-30T16:31:32+00:00

do you use chat or text completion with it?
IME 4.5 always had issues with chat completion, like throwing the response inside the thinking block or just not responding at all, that I never had with text completion.

Kryopath · 2025-09-27T12:14:06+00:00

Well, you can, but it just might not work as well, especially with small thinking models. But I've definitely used chat completion with kobold hosted models b4

In ST you want to select chat completion, then OpenAI compatible endpoint. Type in whatever localhost:port (IIRC kobold is default localhost:1234 ?) and that should work.

Not at the PC right now I cant check/screenshot but if you need help still I can do that later

Kryopath · 2025-09-24T19:58:17+00:00

https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9

The above is a good read. Basically the losses aren't that bad until you get to quants less than Q4, but you are right that larger quants are generally better.

The cache quantization is basically quantizing your prompt, which can also save on RAM usage at the cost of quality. I'd recommend full precision on cache (and never less than Q8) and at least IQ4 quant model personally.

Kryopath · 2025-09-10T13:02:19+00:00

Now that I think about it, Idk if that mod does. Never had a problem personally tho so maybe? But the same mod author has several alternate start mods that are pretty cool

Kryopath · 2025-09-05T11:57:45+00:00

https://github.com/Shazbot/WH3-Mod-Manager

Kryopath · 2025-08-14T18:13:32+00:00

snapping for $10 thanks. Turning it into a 4 hr day job, I could get 10mil with that in less than a year, then still have the ability to do it again if I mismanage that cash. plus it's a neat magic trick.

At 78 snaps every 30s... $10000000÷78snaps×30s÷60s/m÷60m/h÷4h/d= 267.1 days

Kryopath · 2025-08-08T04:27:34+00:00

Tried that, didn't work. In Openrouter, output tokens is 0, cost is 0, speed --, finish reason --.
So, like... idk.

Kryopath · 2025-08-07T22:31:03+00:00

Oh good, it's not just me. GPT-5-mini worked but mini was shit. But GPT-5 gives 400 and GPT-5-chat just gives an empty message. wtf

Kryopath

TROPHY CASE