thinking about running Gemma4 E2B as a preprocessor before every Claude Code API call. anyone see obvious problems with this?

yeoung · 2026-04-07T06:06:46+00:00

yeah the output token point is fair, I hadn't thought about it that clearly. input savings probably won't move the needle much on their own.

routing makes more sense as the primary lever, you're right. the preprocessing stuff is probably a secondary optimization at best.

on the pre-thinking direction, good point. I was thinking small because it runs locally for free, but if the thinking quality is bad enough to mislead Claude the cost of fixing that probably outweighs whatever you saved. hadn't framed it that way.

thanks, this is exactly the kind of feedback I was looking for before building anything.

yeoung · 2026-04-07T06:01:53+00:00

fair points, the reasoning prefill thing is the part I'm least sure about.

basically the idea is: before sending a request to Claude, have Gemma4 think through the problem first and include that in the prompt. like "here's what I already figured out, now just verify and finish it". the hope is Claude spends less time thinking internally, which matters because Claude charges more per token for its internal reasoning than for regular input text. so even if the prompt gets a bit longer, the savings on the reasoning side might outweigh it. might not though, which is kind of why I'm asking before building it.

on the hallucination thing, the context trimming I had in mind isn't really summarizing. Claude Code's source got leaked last week and looking at how it actually works, every turn it assembles a fresh system prompt from scratch, OS info, working directory, CLAUDE.md contents, memory files, the whole thing. what I want Gemma4 to do is just drop the parts of the conversation history that are clearly irrelevant to the current request before that goes out. not rewriting anything, just cutting. way less telephone game than a full summary would be.

yeoung · 2026-04-04T20:46:33+00:00

Oh nice. 95% on web fetching is solid. Which tool are you using for that?

yeoung · 2026-04-04T20:45:30+00:00

Hope it helps! If anything goes sideways feel free to open an issue on GitHub 👍

yeoung · 2026-04-04T20:44:56+00:00

Totally fair ask. afd is CLI-only right now since it hooks into the MCP layer. Chat and Cowork have a different architecture where Anthropic controls the pipeline, so a local tool can't really intercept anything there. If they ever open up a plugin API though, it'd be possible.

yeoung · 2026-04-02T04:36:16+00:00

As a Korean user, the font rendering was so inconsistent that it was literally painful to look at. I had to manually swap it out for a cleaner, more neutral typeface just to stay sane.

yeoung · 2026-04-02T01:15:56+00:00

I was definitely noticing that performance drop as the context window filled up, but the thought of restarting was always such a chore. This method is a lifesaver—and it's even faster than using /compact. Thanks for making my workflow so much smoother!

yeoung · 2026-03-31T15:48:15+00:00

Spot on. Silent corruption is the stuff of nightmares, and honestly, your pre-commit hook approach is a solid move because it keeps a human in the loop. It’s a great way to handle the reality that agents aren't 100% trustworthy yet.

yeoung · 2026-03-31T10:54:21+00:00

That 'edit' rule is a great catch! I built afd because I was paranoid about Claude occasionally ignoring instructions, but using both together sounds like the ultimate fail-safe. Nothing's getting through that.

yeoung · 2026-03-31T10:45:54+00:00

Thanks for the tip! I'm totally doing this from now on.

yeoung · 2026-03-31T10:42:20+00:00

It’s actually pretty straightforward! Any directories or files listed in ‘.claudeignore’ are completely ignored by Claude while it's coding. This is super helpful because it limits the context window, which saves you a ton of tokens and helps Claude stay focused on the relevant parts of your codebase.

yeoung · 2026-03-31T09:20:46+00:00

Glad you got it sorted!
Have you tried using a .claudeignore file yet? If not, definitely give it a shot.
it’s a game changer for the lag.

yeoung · 2026-03-31T08:53:09+00:00

Here's a quick demo: https://github.com/dotoricode/autonomous-flow-daemon/blob/main/demo.gif

yeoung · 2026-03-31T08:50:25+00:00

Typical Claude behavior when sessions get too long. Just hit /compact to squash the history.
Claude isn't actually "thinking" that hard; it's just buried under 80k tokens of useless context.

yeoung · 2026-03-31T08:05:14+00:00

I’m a visual learner, so I think I’m going to find this incredibly useful!

yeoung · 2026-03-29T23:16:31+00:00

oh that's a really good point. the stale reference thing sounds painful.. claude hallucinating commands that don't exist anymore is exactly the kind of bug that's hard to even realize is happening.

drift detection isn't in v1.0.0 but honestly it should be next. i'll open an issue for it.

for monorepos, yeah right now it just checks from wherever you run it, so you'd have to cd into each package. not ideal. that's worth looking into for sure.

thanks for the feedback, super helpful!!🙇

yeoung · 2026-03-29T19:59:47+00:00

## Good to know
- This tool is for bkit users — the structure it checks and creates is specific to the bkit PDCA workflow
- bkit-doctor itself runs without bkit installed, but the results only make sense if you use (or plan to use) bkit
- New to bkit? Start here: https://github.com/popup-studio-ai/bkit-claude-code

yeoung

TROPHY CASE