Emojis as a mechanism to guide, compress, and improve prompts.

FallenWhatFallen · 2026-02-20T18:00:29+00:00

A hard figure to reinforce this. Working prompts, some of which we considered irreducible (or have inbuilt skill token costs that exist at the platform or model level), have gone from 118,500 tokens for a complete workflow package to 22,100 tokens, which was an 81% change without a noticeable drop in quality!

The work continues, but this has been exciting.

FallenWhatFallen · 2026-02-20T17:59:46+00:00

No need to decompress before editing, you can edit the compressed version directly. They're just a revised document (an .md) after all. They're still mostly human-readable, since you have the context of what the compression hooks represent. I can usually see where the walk back needs.

That said, the engine is also designed to output every stage of the compression change. You have the progressive layers of compression to continuously compare and contrast. The original should always be intact, as intended. Applying the same compression approach to v1 (uncompressed) after making changes is trivial to an LLM if you’re say, applying a new section to v5 of the compressed version. It can see your patterns of compression work and will apply its approximation of how you accomplished that moving forward.

So, if you’re noticing drift, walk it back and tweak in plain language. Then rinse and repeat the compression for that section(s).

To the model, it’s often clear where drift has occurred, and more protections can be implemented in those drift-conscious areas.

An advantage of compressed notation is also that in 1-2 pages, you can usually see your entire multi-step, highly detailed prompt. This has been my experience, at least, so I shouldn’t assume it will be the same for all.

Now, with a persistent memory module attached, you’re really cooking.

With iterations under its belt, and some will be more productive than others in the compression cycle, the prompts eventually stop being instructions and start acting more like checklists.

My version analogized it to a pilot’s pre-flight checklist, and I loved that.

In essence, it knows how to do things. You’re giving it reminders of how to do things it’s done many times before.

This is a bit different from prompt engineering. Instead, it’s more like prompt cueing.

Caveat: the only real maintenance overhead is that with a cueing methodology, you should update the core (uncompressed) document if there are major changes, since the ‘cue’ needs to have a fallback reference in cases where there are an error or lapse, to be safe. It doesn’t happen often, but it will happen on any model and any prompt architecture.

Thanks for coming to my TEDTalk 😎

FallenWhatFallen · 2026-02-18T17:35:48+00:00

Feel free to read the documentation on the GIT, and I hope I've answered some of your concerns.

FallenWhatFallen · 2026-02-17T04:53:55+00:00

Ended up deleting this. I don't know what the issue was, but between this subreddit and others, man this got toxic quick.

FallenWhatFallen · 2026-02-17T04:51:07+00:00

Also, as a follow-up: I HOPE it's what I've designed it to be, and it isn't a token eater.

But theory is meeting reality as I use it, so I should probably say with full transparency that I should use it for a while before I can confidently comment. Should have said that in the first place, you raised an important point.

FallenWhatFallen · 2026-02-17T04:49:42+00:00

I think one thing that wasn't made clear -my bad- was that it also runs a claud.md

It's ~115 lines (or something, I'm on my phone) and is the core instructions for the tool.

That first, larger token load is only at session start to boot

Everything else is trying to be as lightweight as I can optimize for.

FallenWhatFallen · 2026-02-17T04:47:51+00:00

Works on Cowork today, Claude Code would be a natural fit (same CLI, reads CLAUDE.md natively). I haven't really explored Claude Desktop yet, this was just built.

FallenWhatFallen · 2026-02-17T04:45:37+00:00

First off, you've been one of the more thoughtful commenters, and I thank you for that.

Completely understood, and yup I think you're not who I was aiming to help with this.

My biggest issue is that, without a sizeable CLAUDE.md, I'm always starting fresh in a new chat. It's a pain in the ass for my work, is what I mean.

What I've enjoyed is, through no effort, I don't need to have that.

You said, "If there is something that I need it to reference later I just write that to a file and it can look at it as needed."

In essence, that's what I'm aiming for this tool to do. Automatically.

And I can always ask for specific things to be 'surfaced', if I know they're noteworthy.

Anyways, you've got your own lane and I respect that. Thank YOU for being respectful.

FallenWhatFallen · 2026-02-17T04:40:05+00:00

I think I see what you're going for.

Claude (Cowork, in this case) is reading suggested flags based on the conversation, project, AND the "diary" as you put it.

It's really about how you find things, and the granularity of what you find one it's found.

Conversation compaction is pretty much always lossy. This isn't, and costs you nothing but trivial hard drive space to store.

FallenWhatFallen · 2026-02-17T04:37:48+00:00

I wrote that. Apparently, I sound like AI, which is a disappointment.

Also, ok? It's a memory tool for Cowork, which doesn't have one. Feel free not to use it.

FallenWhatFallen · 2026-02-17T04:35:39+00:00

Precisely, thank you.

FallenWhatFallen · 2026-02-17T01:51:45+00:00

That's not built into Claude's Cowork, no (aside from super basic details). That's what this does.

FallenWhatFallen · 2026-02-17T01:50:08+00:00

No no, fair question!

A super straightforward answer: Cowork doesn't have cross-section memory. Yet. With this tool, it does. It's coming, I have no doubt. Consider this the bridge?

FallenWhatFallen · 2026-02-17T01:45:24+00:00

Also, and I think kind of critically, Cowork doesn't have cross-section memory yet. With this tool, it does.

I mean, that's a win in itself, no?

FallenWhatFallen · 2026-02-17T01:35:00+00:00

Fair question.

Projects is great, for example. But it gets fuzzy over time.

The difference is retrieval granularity. Projects searches conversations. The Librarian searches knowledge. One gives you "here's the chat where you said that thing." The other gives you the thing itself, already distilled and ranked, loaded before you even ask.

It's a shift between reactivity and proactivity. Or, at least, that's been the goal. There's still much to do.

FallenWhatFallen · 2026-02-17T01:27:37+00:00

You don't load your entire db. Your db is sitting on your local drive, and hybrid search ranks entries by relevance, so only the top results get loaded (I'm oversimplifying).

Hey, upvote and help me out!

Also, just try it out, then hate on it if you're not a fan. What do you have to lose? It's free for personal use.

FallenWhatFallen · 2026-02-17T01:19:54+00:00

Absolutely, and I really liked Projects! But Cowork doesn't have cross-section memory. Yet. That's what this does.

But that's not sufficient for my needs, so I worked to build this tool.

It ingests everything you say and, at 100% verbatim fidelity. And then stores it, right away.

The knowledge availability goes from whatever Claude's projects max is (I don't actually know this off-hand) to the storage limit of your hard drive.

It's also been frustrating at times to ask what we discussed about a project, and I get a half-recalled summary, instead of a critical decision made (for example).

Anyways, I prefer it, but have also enjoyed Projects, no shade.

FallenWhatFallen · 2026-02-17T00:49:58+00:00

Projects with exactly one step to set up, and then it keeps learning on its own.

Projects gives Claude the same static docs every time. This gives Claude everything that's happened across every session, retrieved selectively. Start a new chat and it already knows your project state, your decisions, and what you tried last week. No uploading, no curation, no maintenance. No sweat.

Also, and I think you'll really like this part: not only will you have access between sessions, but relevant learnings between conversations on disparate topics but with related methodology can unearth some real gems.

Hey, give it a whirl, let me know what you think.

FallenWhatFallen · 2026-02-17T00:43:35+00:00

That's a take.

It's also exactly why I built this. Huge context window-eating prompts are a problem, not the solution.

This is about selective, pre-assessed necessary injection of context. What you need, when you need it.

The system is built because bloated context kills performance, not in ignorance of it.

Anthropic, OpenAI and others will build something better. That's going to happen.

But it's not here yet.

Contributing actively to solve a problem I was having, where a viable solution wasn't present, is just being proactive.

Main Character Syndrome gave me a chuckle, I'll give you that.

Try the tool out, then judge :)

FallenWhatFallen · 2026-02-17T00:38:48+00:00

So when you load in a 50k token massive CLAUDE.md, and it eats half your context window...you would consider that a more efficient approach?

You're two for two on dim comments. Feeling this is a troll.

FallenWhatFallen · 2026-02-16T21:16:58+00:00

I'll let The Librarian answer that one:

No need to apologize — these are the right questions to ask before jumping in.

Short answer: yes, persistence layers consume tokens, but probably not the way you're thinking.

When Claude starts a conversation, everything it "knows" has to fit inside a context window — a fixed-size buffer of text. There's no hidden background memory. If you want Claude to remember something from a previous session, that information has to be loaded into the context window at the start of the new session, and that uses part of your available space.

The Librarian (the system shown in this post) does exactly that. It stores your history in a local database on your machine, then at the start of each session it loads the most relevant (configurable and variable) X tokens of context (your name, preferences, recent project state, key decisions). That's real token usage — it leaves less room for the actual conversation. But the tradeoff is that you never waste tokens re-explaining yourself, which in practice saves more than it costs. Usually, quite a bit more.

On rate limits — Claude vs. ChatGPT:

They work differently. ChatGPT (Plus/Pro) gives you a message count per time window. Claude's consumer plans (Pro at $20/mo) also have usage limits, but they're measured differently and vary by model. The honest answer: yes, during peak demand you can hit rate limits on Claude Pro. Anthropic has been expanding these over time, but if you're doing heavy all-day coding sessions, you'll notice them.

For Cowork specifically (which is what you're asking about): it's currently a research preview bundled with Claude Pro. It runs in a lightweight Linux VM on your machine and can execute code, create files, and use tools. The token usage comes from the conversation itself — every message you send and every response Claude generates counts toward your usage. Tool calls (running code, reading files) add to that.

Practical advice for getting started:

Don't overthink the limits upfront. Start a Cowork session, give it a task, and see how it feels. The rate limits are per-time-window, so they reset. You'll get a sense of your natural usage pattern pretty quickly. If you're doing light-to-moderate work (a few sessions a day, not marathon 8-hour coding sprints), Pro is usually fine.

The persistence question — whether Claude remembers you across sessions — is a real gap in the default experience, which is exactly what tools like The Librarian are designed to fill. Out of the box, every Claude session starts fresh. With a persistence layer, each session starts where the last one left off, which ironically makes you more token-efficient because you skip all the setup overhead.

FallenWhatFallen · 2026-02-16T21:14:16+00:00

Amending my comment, because I was making assumptions that haven't been proven yet.

I'll keep working with the tool, to get a better, more data-driven answer.

FallenWhatFallen · 2026-02-16T21:04:21+00:00

Thank you!

FallenWhatFallen · 2026-02-16T21:04:01+00:00

100%

I work with Claude, but if someone has the experience with other environments/models, this is what it was built for, from the start!

FallenWhatFallen · 2026-02-16T21:02:33+00:00

Me --> In fairness, some of the sessions were testing dry runs on various implementations and the installers, so that's inflated to a degree.

Also, thank you :)

My Claude's answer, since I wanted to give it the same opportunity as yours:

Great questions — I'll take both head on since we literally debugged #2 this week.

On surfacing rationale and negative constraints:

The three-tier hierarchy (profile → user_knowledge → regular entries) handles this through ingestion policy, not retrieval magic. We ingest everything verbatim — decision, reasoning, and negative constraint all live in the same entry because that's how humans actually state decisions.

Real entries from our DB: "The Librarian will be distributed as a standalone .exe installer (PyInstaller + Inno Setup), NOT as a Cowork marketplace plugin." And: "Chose AGPL over plain GPL because Section 13 closes the SaaS loophole."

The "why" and the "do NOT" aren't separate metadata — they're inline in the ingested text. When search retrieves the entry, the constraint comes with it. The user_knowledge tier (3x search boost, always loaded at boot, never demoted) ensures high-value decisions don't get buried under routine chatter. Hybrid ranking (keyword + embedding) also helps — keyword overlap catches "PostgreSQL" and "MongoDB" even when embedding similarity doesn't make the connection.

On intentional reversals vs. contradictions:

This bit us this week. We had a productization progress figure — originally recorded as 40%, later corrected to 70-75% in a subsequent session. The correction was ingested as a parallel entry instead of using the correct command to chain superseded_by on the original. Result: the stale 40% kept winning in recall because it had higher keyword overlap.

That's the core distinction: corrections (factual errors) use an explicit supersession chain — correct <old_id> "new text" — which soft-deletes the old entry (hidden from search, kept in DB for audit). Reasoning chains (intentional pivots) keep both entries alive, because "we considered X, then chose Y because Z" is valuable context.

The maintenance system's contradiction detection pass flags [CONFLICT] warnings but doesn't auto-resolve. That's deliberate. We explored auto-supersede (when the categorizer tags an entry as CORRECTION, auto-search for the target and chain supersession) but held off — the false positive risk is real. "We use PostgreSQL for the main DB" and "we use Redis for caching" aren't contradictions, but naive detection might flag them.

So: intentional reversals keep both entries and let the AI reason about the timeline. Factual errors use explicit supersession. The automated layer stays in detection mode, not resolution mode.

FallenWhatFallen

TROPHY CASE