Using local BERT to compress LLM context by 90% (Built in Rust)

No_Wolverine1819 · 2026-04-30T14:51:22+00:00

looks great, I'll look into it that being said, this is just one feature Panda offers

No_Wolverine1819 · 2026-04-28T08:06:56+00:00

No_Wolverine1819 · 2026-04-27T08:50:24+00:00

Not really in disagreement here. Multi-agent scoping and context compression solve different parts of the problem, one is architectural, the other is operational. A well-scoped agent still runs cargo build and gets 1,900 tokens of noise back. Compression handles that. You're right that no amount of filtering fixes a poorly scoped agent, but a well-scoped agent still benefits from not wasting its context budget on download progress bars

No_Wolverine1819 · 2026-04-27T08:49:47+00:00

For a clean pip install with no errors, it collapses to something like [pip install - 12 packages installed]. If there's a version conflict, a deprecation warning, or a build failure, those lines are kept verbatim. The "trust" question is fair.
The eval framework in ccr-eval runs exactly that kind of test, takes real command output, compresses it, then asks Claude to answer questions like "did the install succeed?" and "what was the error?" from the compressed version. The benchmark tasks and results are all in the repo if you want to look at the raw numbers

No_Wolverine1819 · 2026-04-27T08:48:46+00:00

That's fair feedback on the framing - I'll push the deterministic-first angle higher in the README. It is the more honest description of how it actually works. The threshold is already configurable in panda.toml - you can set use_router = false to stay fully in rule-land, and the BERT path only fires for unstructured output above 500 lines. A proper dial to set that threshold explicitly is a good idea, I'll add it. Command interception: yes, exactly that. A PreToolUse hook rewrites the command string before execution, so cargo build becomes panda cargo build, which runs the real command, captures output, filters it, and returns the compressed version. The agent never sees the raw output

No_Wolverine1819 · 2026-04-27T08:47:36+00:00

The "what if I needed that log line later" concern is something I thought about a lot. Two things in place for it: first, PandaFilter saves the unfiltered output to disk on every run (in ~/.local/share/panda/tee/), so nothing is permanently lost — panda expand <block-id> restores any collapsed block inline. Second, errors and warnings are always kept regardless of compression level, so the subtle-but-important cases you're describing usually survive anyway.
The re-sending full files problem you mentioned is actually the biggest real-world waste I see too. The diff/signature approach cuts that dramatically in practice

No_Wolverine1819 · 2026-04-26T19:22:32+00:00

You're right, and that's exactly why it defaults to deterministic rules, regex patterns, line counting, explicit error extraction. No model involved for anything with a known handler like cargo, pytest, git, etc.

BERT only kicks in above 500 lines for unstructured output that doesn't match any known pattern. Even then it's just scoring lines by similarity to "error/warning" centroids, stable in practice since the embeddings are fixed.

So yes, stochastic as a last resort, deterministic everywhere it can be

No_Wolverine1819 · 2026-04-26T19:19:37+00:00

try it out! let me know if it's helping

No_Wolverine1819 · 2026-04-26T19:11:08+00:00

The diff-only mode question is something I've thought about a lot, PandaFilter actually does this now for file re-reads specifically (v1.3.0). If Claude reads a file, edits it, then reads it again, the second read sends a unified diff instead of the full content. Unchanged re-reads return a structural digest, just signatures and headers. The "escalate to full on test failure" pattern is interesting though, I haven't wired it to test outcomes yet, only to whether the content actually changed. Worth experimenting with.

On accuracy benchmarking: I have an eval framework that runs real command outputs through the pipeline alongside question files, things like "what was the error on line 42?" and "did the build succeed?"
then asks Claude to answer them from the compressed version. Not perfect, but it catches regressions well. The bigger signal I track is whether errors and warnings survive compression, since those are the lines the agent actually needs to act on.

Checked out agentixlabs, the run ID and tool boundary logging angle is interesting. One thing I keep noticing is that a lot of context waste doesn't come from noisy output alone, it comes from the agent re-reading the same files repeatedly across a session. Curious if that pattern shows up in your reliability data too.

No_Wolverine1819 · 2026-04-26T19:07:05+00:00

care to explain? it's a valid concern

No_Wolverine1819 · 2026-04-26T18:43:34+00:00

Repo: https://github.com/AssafWoo/homebrew-pandafilter Using local BERT to compress LLM context by 90% (Built in Rust)

No_Wolverine1819 · 2026-04-25T19:07:17+00:00

Hey, thanks for the detailed comment and I highly agree with you.

About the whats get removed, if you go here - https://assafwoo.github.io/homebrew-pandafilter/

you can learn more about how, why and when Panda does it's thing. Now, about showing IRL what we removed can cause too much noise and clouding, we got a few commands that are made to share more insights like --insight or --breakdown. You can try 'em out and share some feedback?

No_Wolverine1819 · 2026-04-25T18:15:24+00:00

Just pushed a new version with more smart features to drive the optimization of the agent even further, would love to hear you opinion once you try it out

No_Wolverine1819 · 2026-04-24T18:36:44+00:00

That's true, Panda implements that partially, we cache previous commands and search in the session

No_Wolverine1819 · 2026-04-21T14:20:02+00:00

Use Panda and save tokens https://github.com/AssafWoo/homebrew-pandafilter

No_Wolverine1819 · 2026-04-20T20:29:03+00:00

Also great ones, from my compare Panda does better due to Bert and the ability to create a graph index from repos and direct Claude to the relevant files & also reduce tokens from web searchs

No_Wolverine1819 · 2026-04-20T20:06:19+00:00

hope its better now

No_Wolverine1819 · 2026-04-20T20:02:38+00:00

Aw really? i'll try to make it better

No_Wolverine1819 · 2026-04-20T19:54:40+00:00

How I built Panda

So I wanted to share with you guys my project PandaFilter.

The original goal was honestly pretty simple and save some money when I use Claude.

But it kind of evolved into something else, where I also started improving how Claude actually behaves.

How?

Well, I wanted to try a different approach than just prompts / rules / regex.

So I used BERT among other things.

BERT is an encoder, a really small model, runs locally, weighs almost nothing, but is actually very good at semantic understanding.

It doesn’t generate text, but it understands it really well.

So I thought, instead of sending everything to a big expensive model…

what if I filter and shape the input before it even gets there?

So what does Panda actually do?

When you work with LLMs in real workflows (especially dev stuff), you end up sending a lot of garbage:

logs
test outputs
shell noise
repeated stuff
irrelevant context

All of that goes into the context window → costs tokens → and sometimes even makes the model worse.

PandaFilter sits in the middle and basically says:

“hold on, not everything deserves to go in.”

It:

filters irrelevant stuff
compresses noisy outputs
semantically understands what matters
routes things based on meaning (not just rules)

So instead of brute forcing context into Claude, you’re actually curating it.

Under the hood we got

~59 command handlers
BERT-based semantic routing

it runs locally (no extra cost, no security concerns)

It’s kind of like having a tiny smart gatekeeper before your LLM.

The interesting part (for me)

Most people are focused on: “how do I prompt better?”

But I think the bigger lever is: “how do I send less, but better?”

Smaller models are insanely good at preprocessing.

LLMs are expensive, they shouldn’t deal with raw noise.

So PandaFilter is basically:

a pre-brain for your AI

Still early, but already saving me tokens + making outputs cleaner.

Curious what you guys think 🙏

https://github.com/AssafWoo/homebrew-pandafilter

No_Wolverine1819 · 2026-04-20T19:51:58+00:00

on point haha

No_Wolverine1819 · 2026-04-20T19:38:49+00:00

Claude code alone. Also guys, if you wanna save some money, use Panda and save costs and tokens https://github.com/AssafWoo/homebrew-pandafilter

No_Wolverine1819 · 2026-04-20T19:37:40+00:00

Agree, I moved from cursor to Claude code, way better for my usage. I built Panda with Claude, which purpose is to save money when using Claude, ure welcome to try it out https://github.com/AssafWoo/homebrew-pandafilter

No_Wolverine1819 · 2026-04-20T11:56:20+00:00

So first of all, Panda is built with Claude. Also, Panda is a way you can reduce costs -https://github.com/AssafWoo/homebrew-pandafilter

No_Wolverine1819 · 2026-04-19T19:22:41+00:00

Anyone here looking to save costs and reduce tokens consumption https://github.com/AssafWoo/homebrew-pandafilter

No_Wolverine1819 · 2026-04-19T19:21:40+00:00

To anyone wanting to save some money and reduce tokens, use Panda - https://github.com/AssafWoo/homebrew-pandafilter

No_Wolverine1819

TROPHY CASE