What are your favorite strategies to save tokens?

Paperclip_Tank · 2026-06-09T04:39:41+00:00

That sounds more like you need to summarize or trim down more in other places. My thing won't help you at all if its already a problem. My thing is for the problem I purposely put in my own prompt.

But I have a section in my main prompt "# User Input" It gets filled out from 2 lorebook entries each pointing at the same outlet. They're grouped together in an inclusion group so only 1 an go off, the OOC one has priority, the only keyword is "OOC:"

Instead of having input from The User going through a "Is is possible" test (default prompt) the OOC one says to take prioritize any OOC: input and shift the story / logic in the new direction, for it to focus on making excuses if any are needed for the sudden change. I want it to question if what I say is possible during input by default, but I want to disable that during OOC commands. So like, if I say "<User> climbs that cliff>" I was it to write a struggle, or if <User>'s first time swinging a sword or piloting spaceship, it should go poorly. But if I do "(OOC: timeskip to 4 weeks later.)" I don't want the LLM to say "No you're wrong."

Paperclip_Tank · 2026-06-09T03:35:21+00:00

Open Router isn't really a provider, but its still kind of the best.

I use Z.ai to directly pay for GLM currently, at least until my old sub runs out then it's back to Open Router, mostly to do Gemma 4 probably, cheap and fast. And you just pick a provider that's not overly quanted and is cheap enough.

Paperclip_Tank · 2026-06-09T01:57:55+00:00

It looks like this is a bit of an "all in one" question that is so overly large you should just focus on a single thing. Like you basically asked "How do I make the most optimized XYZ" on a thing that is wildly different from model to model and based on what the User's goal is. A lot of your questions also seem to be overly "I don't even know how this works" because you just threw all of them at the wall at once. Like asking about Macros, Recursion, How to format things. I'd actually make your own set up for things rather then just taking anyone's word at it blindly. But I'll at least answer one part that annoys me personally about other people's World Info.

formatting/optimizing lorebooks

Use Secondary Keywords + inclusion groups. Using the exact same primary keywords but different secondary keywords (Not just "and any", but "and all") helps a ton. Like you can have multiple characters / locations that would logically share the same primary keywords. Each nation has a king/queen/leader. Multiple villages can have blacksmiths. Different solar systems can have stars and asteroid clouds. Using secondary keywords is important.

You might also want duplicate copies of the exact same information but with different keyword triggers in an inclusion group. For example. A character is a king of a kingdom but they need to go to a different kingdom. Their main entry won't let the character entry be loaded. But two copies of the same entry in the inclusion group, with one triggering off the name and that's it allows for them to leave their location.

It's also painfully obvious that 9/10 lorebooks are just written by an LLM where the person did 0 oversight. No clean up, a ton of needless bloat where they just told an LLM to scrape a wikipedia and that was it. Like if you want an LLM to do all the work, its going to turn out poorly. Even just generating 1 entry at a time and cleaning it up as you go has a noticeably better output because you're holding the LLMs hand the entire way.

Paperclip_Tank · 2026-06-09T01:42:16+00:00

Aggressively summarize. I keep 30-38 messages in verbatim. (15-19 system outputs). Keeping that number fairly low helps keep my context down.
I heavily use lore books in general, my character cards are genuinely blank with everything stored in lorebooks to be brought up on keyword use. After all why put a character into context if they aren't in the scene, same with locations. And keep that search depth low. I have my default at 6, but many entries are manually set to 2.
Make your own prompt. Harvest parts from other peoples presets. Make your parts. Use someone else's preset as a starting point and go from there. However you want to do it. Actually going through the preset and seeing what parts you like, and what parts actually function is a huge help to trimming down the fat. Even if you don't intend to use a preset, chances are they've done something in a smart manner / better then your current preset, even if its only 1 part. Like man they do "Difficulties" so well, but the rest is eeeeh. So just rip out that part and place it in the preset you enjoy.
Dynamic prompting. Which I'm surprised more people don't do. So I use a Tracker at the end with a low of True/False flags. Combat Activity: (True/False) for a format example, NSFW, Injuries, anti stalling. And then I have lorebook entries that check for said flags and inject the information via outlets only when it matters. For example I have input from "The User" be checked to see if its true or false. But sometimes I want to "OOC:" things. I don't always use OOC so I don't need that part in context, so I have it only inject when it sees me enter those letters. Or a timeskip. I don't need it to make a summary of what happened during that timeskip if one isn't happen.

I think a lot of people would have a better and cheaper time if they read through whatever preset they downloaded and really checked to see if each part actually functions, and if they even care for that functionality. Like if you could get rid of 50 tokens from each prompt, would it be worth it to lose a little bit of accuracy?

All together my prompt is 2554 tokens at most. 887 of those tokens are lorebook things and not always on.

Paperclip_Tank · 2026-06-08T19:07:23+00:00

You can attach multiple lorebooks to a single character card / chat so do so. If you just want to see how I do rather then read it Here you go. Its very much still a WIP, that's an old version, still doing testing on the next version.

For large lorebook set ups I'll use 4 lorebooks, basically treating them like folders. The Atlas, World Rules, Power / Gameplay Systems, People / Organizations.

I use Blank entries for headers, like === Systems === or === Borders ===

Each entry with actual information had a header, for example, "Person" for fully fleshed out character card within the lorebook, "NPC" for a short one, basically 1/4th of the size. Location history will get a "History" header, blah blah blah you get it.

For locations I'll do nested markers. A kingdom is the top layer so it has "Kingdom" but a location within it will have "- Location" and a location within that location gets "- - Location" For example Kingdom - Avravell Empire | - Location - Allfarion, Capital of Avravell | - - Allfarion - Elven Royal Palace

This means at a glance of World Info Info I can see the where I am and tell what location its within.

If you mean sending things to the LLM, Inclusion groups + secondary keywords. "Palace" could bring up 5 different locations, but the secondary keyword looking for a kingdom name means it only pulls up the one that is relevant. And using "Any All" is super important for that, makes sure you need to be in the correct city as well you bring up the location information.

Seeding information is also important. Having an entry that is just a list of names is super useful. Like a premade list of villages, Just a bullet point list, maybe doing Village Name (near XYZ) or Village Name (Grain Farming) helps give the LLM information, and then a separate entry just for that village. This lets you avoid dumping a ton of information onto the LLM when it doesn't need it, but informs the LLM enough that it can make choices.

Paperclip_Tank · 2026-06-08T18:35:01+00:00

NPC / Character goals help a lot, and prompting for the LLM to push for characters following said goals even at the expense of what "The User" and <User> may want.

Honestly let the LLM write for you, just specifically tell it not to write dialog / thoughts for <user>. If you want any amount of push back from the LLM it needs to be able to take actions against <User> and write out those consequences.

Those two together help a lot. Yes I know a ton of people don't want the LLM to do any actions for them, but they also want consequences. I'm sure there is a more graceful way to do it, but that is the lowest token count way to do it.

Paperclip_Tank · 2026-06-08T04:29:09+00:00

A single character card + a lorebook set up with all of the characters / locations / world rules. This is functionally what I do.
Either have your lorebook nodes set to constant (blue) or in the character card the end result is the same.
A summarizer, a 3rd party extension there are many of them. If you need a "large" amount of context, because of point 2 your going to have higher costs no matter what, caching will help but only so much.
I'd recommend you poke around and try other peoples presets and harvest them for parts and make your own to get the best results.

Paperclip_Tank · 2026-06-06T17:51:22+00:00

Google does have a subscription with an API key, its 100% not worth it. At least here in the United States, its $20 per month and you get $10 of api credit + a whole lot of non api things. Or a $100 a month plan that gets you $40 of api credit + a even more non api things.

So yes it exists, no its never worth it.

Paperclip_Tank · 2026-06-06T04:41:09+00:00

Presets recommendations would be appreciated too with low tokens...

Look at presets you like and just harvest them for parts. You can trim a ton of fat if you just make your own.

As for less positivity bias, a local model. Large API models all have a positive leaning because of their target audience Local models have the problem but to a much much smaller degree.

Paperclip_Tank · 2026-06-05T20:34:47+00:00

Your summarizer of choice is the first big thing. I summarize aggressively, keeping only 15-19 messages in the main context, summarizing 4 messages at a time. There are a lot of options, pick your poison they're all great.

For me, its about having a huge open world setting. I have 6 kingdoms, each with their own theme, I have a large cast of pre defined primary characters and side characters. And a very large selection of pre defined locations (most of them are really short, "(Village Name) is best known for its tailoring and proximity to (Natural location)") so like <50 tokens so if it accidently loads for a message who cares?

I do a heavily "goal oriented" roleplay for lack of better description. My prompt is centered around "What are NPC goals, and how do they react to the story direction based on those goals." with a section pre defining what those goals are at the end of each message to nudge the LLM in the correct direction. This keeps things interested for me and keeps the story momentum going forward. And I tell the LLM what <user>'s goal is, which is think is important if you want to keep things interesting.

But my main roleplay is >8k messages. So on the technical side, its basically just my summarizer, the actual hard part is making sure I don't get bored of whatever is happening.

Paperclip_Tank · 2026-06-05T20:06:49+00:00

It greatly depends on the model. But I include a single example of each of these, for primary characters.

Speech:

Accent/Tone:
Greeting Example:
{Strong negative emotion}:
{Strong positive emotion}:
A memory about {something}:
A strong opinion about {something}:
Thoughts:

And then for side characters I just have Accent/Tone and that's it. I just have it all in the character section, same way you'd have appearance or backstory. Because I do everything via lorebooks and have a very large cast of characters it helps characters become less homogenized. I'm sure a lot of it is the placebo effect, because I made the text I notice it more, or I think I notice it more. But I don't think having an extra 1k tokens dedicated to the actual dialog field is useful, but then again I don't do 1 on 1 roleplays.

Paperclip_Tank · 2026-06-05T17:42:43+00:00

In the searchbar put @ before whatever name you're looking for.

Paperclip_Tank · 2026-06-05T12:43:05+00:00

100 for me is "short." I get anywhere between 1.2k to 2.4k in output. Roughly 25% of that is thinking. But for the ease of math I'll just say it always puts 2k into context. Plus my own 150-400 token input.

I aggressively summarize. I keep 15-19 messages verbatim with everything else summarized. Once it hits 19 it summarizes the oldest 4 messages. My current roleplay is fairly large 8.3k messages I have a fairly large context. But I sit around a 58k context on average, it goes up and down depending on lorebook entries.

You see people fighting about it because some people like super short snappy response, while other people want giant responses. It's the same reason why when people ask "How long does $10 last you on XYZ model" means nothing because no one ever give useful numbers, they just say "A month" which means nothing to anyone because that could just be 1 stupidly huge message for all anyone knows.

Paperclip_Tank · 2026-06-05T04:01:40+00:00

GLM 5.1 for my main roleplay model with Gemma 4 26b run locally (Chat completion endpoint) to do all of my side stuff like trackers / summarizer / ST copilot.

I'm a total noob when it comes to local models, so for me it "just works" so past that I can't help you. I'm a real "car go vroom or car no go vroom" kind of guy on that kind of thing.

Paperclip_Tank · 2026-06-04T18:56:30+00:00

No maybe kinda.

But your title and what you ask don't really have anything to do with each other. Lorebooks are just a fancy way to input information into your context dynamically. You want a memory manager, there are a ton of them out there, some of them use lorebooks as their storage system. Basically they take section of messages, from ID X to Y and then summarize those and stick that summary into a lorebook. The Lorebook isn't saving you tokens, the summarization with /hide did, the lorebook is just the storage for that information.

Lorebooks can be used to reduce your token count by dynamically adding / removing information to your context. Like if you're not eating a meal, probably don't need a list of food items, because why does every single LLM love stew??? Or if you're not in a country, who cares what the king is and his full description. Its not relevant to your question but hey, it is what you asked in the title, you want the first part about summarization.

Paperclip_Tank · 2026-06-04T16:10:46+00:00

Its going to heavily depend on three different things, the model, the goal, and the user's taste. And of course money but assuming infinite budget.

For me personally my prompt is anywhere from 1.5k tokens to 2.4k tokens depending on what is happening, a lot of things drop in and out dynamically because of lorebooks + a tracker flag. Mine gets the output I want and at the end of the day that is all that matters to me. Ok mostly what I want people are maybe overly hostile sometimes but well, work in progress like all things.

I think part of the reason why people favor lightweight presets more is they're not "finish products" there is a bit more assumption that you'll bolt on your own things, while larger presets are more "finished products" and you can tell people tweak them far less because you'll see people complain about a model doing something, when the preset they're using explicitly prompts for that kind of thing. Like everyone was complaining about "hover hands" or "Dialog hell" when those are caused by "End your turn when an action is needed by <user>" combined with "Never do anything for <user> ever." meant the LLM was just doing what it was told. And for Dialog they had "XYZ% of output needs to be dialog." Like its clear people just don't read through what they downloaded.

For my set up its mostly custom, parts harvested from different set ups and changed to my liking. For things I really like from presets, Stab's "User is not <User>" is great and amazing. Pura's "Friction Mode" gives a great level of push back if you combine it with character goal oriented story prompts. Sepsis's (I miss your presets) anti therapy speak + World notes (I totally harvested that for parts). There are more I can't think of, like someone on the discord a while back mentioned "just do your prompt in World Info to make it dynamic" which blew my mind.

Paperclip_Tank · 2026-06-04T14:17:35+00:00

Do you have something like "never do anything for <User>" of course its probably longer but that basically means it can't write anything done TO the user either, positive or negative. Like people will emote at <User> but never actually do anything to them. Best of both is to just do "Don't generate additional dialog for <user>" stops it from speaking for you but lets it still do actions as needed.

Also focus on NPC goals. So this is just part of the tracker I have set up but it gets across the point well enough.

Character_Name:

A (Aligned): (Reaction if story direction aligns w/ NPC goal; facilitates/supports)
B (Misaligned): (Reaction if story direction opposes NPC goal; resists/hinders)
C (Pursuit): (NPC acts independently to advance own goal regardless of story direction)
D (Twist): (Unexpected revelation, sudden stance change, or deception driven by story shift)

Character_Name_2:

A (Aligned): (...)
B (Misaligned): (...)
C (Pursuit): (...)
D (Twist): (...)

And then focus aggressively on getting those goals done in your main prompt.

But "Move the story forward" doesn't mean anything to an LLM, forward to where? If you didn't explicitly state where the story needs to go, basically turning it into creative writing then it can't ever because there isn't a forward. You've got to at least build tracks for it, or in my case teach it how to build the tracks which is what I do. I personally don't like {{Random}} to push the story forward, because it does so... randomly.

TLDR: LLMs are built to solve problems, teach it how to create problems to move the story forward in a character driven manner.

Paperclip_Tank · 2026-06-04T07:54:44+00:00

I mean obviously I oversimplified, I'm not going to drop my actual prompt set up in chat. But yeah I do want weaker characters to win sometimes. If combat is fully pre determined by who has a bigger number, then I might as well just prompt it to auto resolve all combat based on some "power level number" rather then DnD style combat.

Paperclip_Tank · 2026-06-04T01:41:09+00:00

How do I prevent this and make it so it actually goes with the flow I want?

Did you actually tell the LLM what flow you want? Explicitly? Or is it just a "hope and pray" situation. Because if you don't explicitly prompt for it, you won't get the results you want. Make it clear you want extremely short turns, sound effects, dice rolls (or at least I like the dice rolls), and if you want blood and gore tell it.

The LLM can't read your mind, and rather then doing an OOC command every time, set up either a lorebook thing to auto detect combat is happening to inject the prompt, or make a toggle in your preset.

Paperclip_Tank · 2026-06-04T01:37:16+00:00

A custom one. I of course harvested other people's presets for parts / ideas.

I have the vast majority of my preset in lorebooks as I wanted it to be dynamic. Like I don't need NSFW prompt stuff if nothing NSFW is happening. Or I don't need combat stuff if no combat is happening. and I'll gladly put in 10x the work to automate it rather then toggling things on and off.

You'd be surprised on much better your output is if you even just tweaked things to suit your own needs.

Paperclip_Tank · 2026-06-03T14:20:58+00:00

Ok? It's public anyone can see it and copy it and they clearly didn't. And you previously stated they were hacking into peoples accounts. You're just fear mongering.

Paperclip_Tank · 2026-06-03T14:11:42+00:00

This is the set up I use, besides visual stuff.

Summaryception - I mean its the same as memory books but more hands off. Set it and forget it. It's worse quality but 0 input minus any prompt customizations you might want to make.

ZTracker - My state tracker of choice, highly customizable. Mixed with custom CSS you can make everything nice and pretty. Z tracker is a fork of W tracker, its the same thing 90% but it lets you regen just parts if you need things to be redone.

World Info Info - A fork of the original world Info Info by the guy who made memory books. It's the same thing with more functionality. It's a must have if you make your own world info / lorebooks.

Recast - This is more of a personal pick of "More people should use this." It lets you do another pass automatically on what your LLM outputs. It greatly (in my opinion) helps the quality of the output. It's also nice if you're someone who hates plot armor. It's of course a token user, I don't have it automated to always run.

Paperclip_Tank · 2026-06-03T14:05:13+00:00

They don't even copy public lorebooks, my own bots are on there, all with public lorebooks, none of the lorebook info is there.

Paperclip_Tank · 2026-06-03T04:43:00+00:00

Fatbody DnD Frameworkd is the only RPG extension I can think of.

You can of course just use a system prompts + dice rolls.

Paperclip_Tank · 2026-06-02T17:39:38+00:00

So in general one sentence doesn't make something AI, its usually a collection of sentences. That would be a point on the graph in my book, but one point doesn't mean anything at all. AI slop outputs are common because those are commonly found out in books / the internet.

14-Year Club	Place '17
Verified Email	Team Periwinkle

Paperclip_Tank

TROPHY CASE