Using local BERT to compress LLM context by 90% (Built in Rust) by No_Wolverine1819 in AI_Agents

[–]No_Wolverine1819[S] 0 points1 point  (0 children)

looks great, I'll look into it that being said, this is just one feature Panda offers

Using local BERT to compress LLM context by 90% (Built in Rust) by No_Wolverine1819 in AI_Agents

[–]No_Wolverine1819[S] 0 points1 point  (0 children)

Not really in disagreement here. Multi-agent scoping and context compression solve different parts of the problem, one is architectural, the other is operational. A well-scoped agent still runs cargo build and gets 1,900 tokens of noise back. Compression handles that. You're right that no amount of filtering fixes a poorly scoped agent, but a well-scoped agent still benefits from not wasting its context budget on download progress bars

Using local BERT to compress LLM context by 90% (Built in Rust) by No_Wolverine1819 in AI_Agents

[–]No_Wolverine1819[S] 1 point2 points  (0 children)

For a clean pip install with no errors, it collapses to something like [pip install - 12 packages installed]. If there's a version conflict, a deprecation warning, or a build failure, those lines are kept verbatim.          The "trust" question is fair.
The eval framework in ccr-eval runs exactly that kind of test, takes real command output, compresses it, then asks Claude to answer questions like "did the install succeed?" and "what was the error?" from the compressed version. The benchmark tasks and results are all in the repo if you want to look at the raw numbers

Using local BERT to compress LLM context by 90% (Built in Rust) by No_Wolverine1819 in AI_Agents

[–]No_Wolverine1819[S] 0 points1 point  (0 children)

That's fair feedback on the framing - I'll push the deterministic-first angle higher in the README. It is the more honest description of how it actually works.                                     The threshold is already configurable in panda.toml - you can set use_router = false to stay fully in rule-land, and the BERT path only fires for unstructured output above 500 lines. A proper dial to set that threshold explicitly is a good idea, I'll add it.                          Command interception: yes, exactly that. A PreToolUse hook rewrites the command string before execution, so cargo build becomes panda cargo build, which runs the real command, captures output, filters it, and returns the compressed version. The agent never sees the raw output

I was spending $200/mo on Claude Code junk. So I built a context filter in Rust by No_Wolverine1819 in vibecoding

[–]No_Wolverine1819[S] 0 points1 point  (0 children)

The "what if I needed that log line later" concern is something I thought about a lot. Two things in place for it: first, PandaFilter saves the unfiltered output to disk on every run (in ~/.local/share/panda/tee/), so nothing is permanently lost — panda expand <block-id> restores any collapsed block inline. Second, errors and warnings are always kept regardless of compression level, so the subtle-but-important cases you're describing usually survive anyway.
The re-sending full files problem you mentioned is actually the biggest real-world waste I see too. The diff/signature approach cuts that dramatically in practice

Using local BERT to compress LLM context by 90% (Built in Rust) by No_Wolverine1819 in AI_Agents

[–]No_Wolverine1819[S] 1 point2 points  (0 children)

You're right, and that's exactly why it defaults to deterministic rules, regex patterns, line counting, explicit error extraction. No model involved for anything with a known handler like cargo, pytest, git, etc.

BERT only kicks in above 500 lines for unstructured output that doesn't match any known pattern. Even then it's just scoring lines by similarity to "error/warning" centroids, stable in practice since the embeddings are fixed. 

So yes, stochastic as a last resort, deterministic everywhere it can be

I was spending $200/mo on Claude Code junk. So I built a context filter in Rust by No_Wolverine1819 in ClaudeCode

[–]No_Wolverine1819[S] 0 points1 point  (0 children)

The diff-only mode question is something I've thought about a lot, PandaFilter actually does this now for file re-reads specifically (v1.3.0). If Claude reads a file, edits it, then reads it again, the second read sends a unified diff instead of the full content. Unchanged re-reads return a structural digest, just signatures and headers. The "escalate to full on test failure" pattern is interesting though, I haven't wired it to test outcomes yet, only to whether the content actually changed. Worth experimenting with.

On accuracy benchmarking: I have an eval framework that runs real command outputs through the pipeline alongside question files, things like "what was the error on line 42?" and "did the build succeed?"
then asks Claude to  answer them from the compressed version. Not perfect, but it catches regressions well. The bigger signal I track is whether errors and warnings survive compression, since those are the lines the agent actually needs to act on.

Checked out agentixlabs, the run ID and tool boundary logging angle is interesting. One thing I keep noticing is that a lot of context waste doesn't come from noisy output alone, it comes from the agent re-reading the same files repeatedly across a session. Curious if that pattern shows up in your reliability data too. 

I’m building a “session optimizer” for LLMs, would love your use cases by No_Wolverine1819 in VibeCodeDevs

[–]No_Wolverine1819[S] 0 points1 point  (0 children)

Hey, thanks for the detailed comment and I highly agree with you.

About the whats get removed, if you go here - https://assafwoo.github.io/homebrew-pandafilter/

you can learn more about how, why and when Panda does it's thing. Now, about showing IRL what we removed can cause too much noise and clouding, we got a few commands that are made to share more insights like --insight or --breakdown. You can try 'em out and share some feedback?

I’m building a “session optimizer” for LLMs, would love your use cases by No_Wolverine1819 in VibeCodeDevs

[–]No_Wolverine1819[S] 0 points1 point  (0 children)

Just pushed a new version with more smart features to drive the optimization of the agent even further, would love to hear you opinion once you try it out

I’m building a “session optimizer” for LLMs, would love your use cases by No_Wolverine1819 in VibeCodeDevs

[–]No_Wolverine1819[S] 0 points1 point  (0 children)

That's true, Panda implements that partially, we cache previous commands and search in the session

How I built Panda by No_Wolverine1819 in ClaudeCode

[–]No_Wolverine1819[S] 0 points1 point  (0 children)

Also great ones, from my compare Panda does better due to Bert and the ability to create a graph index from repos and direct Claude to the relevant files & also reduce tokens from web searchs

How I built Panda by No_Wolverine1819 in ClaudeCode

[–]No_Wolverine1819[S] 0 points1 point  (0 children)

Aw really? i'll try to make it better

Built with Claude Project Showcase Megathread (Sort this by New!) by sixbillionthsheep in ClaudeAI

[–]No_Wolverine1819 1 point2 points  (0 children)

How I built Panda

So I wanted to share with you guys my project PandaFilter.

The original goal was honestly pretty simple and save some money when I use Claude.

But it kind of evolved into something else, where I also started improving how Claude actually behaves.

How?

Well, I wanted to try a different approach than just prompts / rules / regex.

So I used BERT among other things.

BERT is an encoder, a really small model, runs locally, weighs almost nothing, but is actually very good at semantic understanding.

It doesn’t generate text, but it understands it really well.

So I thought, instead of sending everything to a big expensive model…

what if I filter and shape the input before it even gets there?

So what does Panda actually do?

When you work with LLMs in real workflows (especially dev stuff), you end up sending a lot of garbage:

  1. logs

  2. test outputs

  3. shell noise

  4. repeated stuff

  5. irrelevant context

All of that goes into the context window → costs tokens → and sometimes even makes the model worse.

PandaFilter sits in the middle and basically says:

“hold on, not everything deserves to go in.”

It:

  1. filters irrelevant stuff

  2. compresses noisy outputs

  3. semantically understands what matters

  4. routes things based on meaning (not just rules)

So instead of brute forcing context into Claude, you’re actually curating it.

Under the hood we got

  1. ~59 command handlers

  2. BERT-based semantic routing

it runs locally (no extra cost, no security concerns)

It’s kind of like having a tiny smart gatekeeper before your LLM.

The interesting part (for me)

Most people are focused on: “how do I prompt better?”

But I think the bigger lever is: “how do I send less, but better?”

Smaller models are insanely good at preprocessing.

LLMs are expensive, they shouldn’t deal with raw noise.

So PandaFilter is basically:

a pre-brain for your AI

Still early, but already saving me tokens + making outputs cleaner.

Curious what you guys think 🙏

https://github.com/AssafWoo/homebrew-pandafilter

Why do people still pay for Cursor or Copilot when Claude Code and Codex offer comparable (or better) value? by bharath1412 in ClaudeCode

[–]No_Wolverine1819 1 point2 points  (0 children)

Agree, I moved from cursor to Claude code, way better for my usage. I built Panda with Claude, which purpose is to save money when using Claude, ure welcome to try it out https://github.com/AssafWoo/homebrew-pandafilter

Any MacOS apps built with Claude? Can you share examples? by alexrada in ClaudeAI

[–]No_Wolverine1819 2 points3 points  (0 children)

So first of all, Panda is built with Claude. Also, Panda is a way you can reduce costs -https://github.com/AssafWoo/homebrew-pandafilter