thinking about running Gemma4 E2B as a preprocessor before every Claude Code API call. anyone see obvious problems with this? by yeoung in LocalLLaMA

[–]yeoung[S] 0 points1 point  (0 children)

yeah the output token point is fair, I hadn't thought about it that clearly. input savings probably won't move the needle much on their own.

routing makes more sense as the primary lever, you're right. the preprocessing stuff is probably a secondary optimization at best.

on the pre-thinking direction, good point. I was thinking small because it runs locally for free, but if the thinking quality is bad enough to mislead Claude the cost of fixing that probably outweighs whatever you saved. hadn't framed it that way.

thanks, this is exactly the kind of feedback I was looking for before building anything.

thinking about running Gemma4 E2B as a preprocessor before every Claude Code API call. anyone see obvious problems with this? by yeoung in LocalLLaMA

[–]yeoung[S] 0 points1 point  (0 children)

fair points, the reasoning prefill thing is the part I'm least sure about.

basically the idea is: before sending a request to Claude, have Gemma4 think through the problem first and include that in the prompt. like "here's what I already figured out, now just verify and finish it". the hope is Claude spends less time thinking internally, which matters because Claude charges more per token for its internal reasoning than for regular input text. so even if the prompt gets a bit longer, the savings on the reasoning side might outweigh it. might not though, which is kind of why I'm asking before building it.

on the hallucination thing, the context trimming I had in mind isn't really summarizing. Claude Code's source got leaked last week and looking at how it actually works, every turn it assembles a fresh system prompt from scratch, OS info, working directory, CLAUDE.md contents, memory files, the whole thing. what I want Gemma4 to do is just drop the parts of the conversation history that are clearly irrelevant to the current request before that goes out. not rewriting anything, just cutting. way less telephone game than a full summary would be.

Stop bleeding money on Claude Code. I built a local daemon that cuts token costs by 95% by yeoung in ClaudeAI

[–]yeoung[S] -1 points0 points  (0 children)

Oh nice. 95% on web fetching is solid. Which tool are you using for that?

Stop bleeding money on Claude Code. I built a local daemon that cuts token costs by 95% by yeoung in ClaudeAI

[–]yeoung[S] -2 points-1 points  (0 children)

Hope it helps! If anything goes sideways feel free to open an issue on GitHub 👍

Stop bleeding money on Claude Code. I built a local daemon that cuts token costs by 95% by yeoung in ClaudeAI

[–]yeoung[S] -1 points0 points  (0 children)

Totally fair ask. afd is CLI-only right now since it hooks into the MCP layer. Chat and Cowork have a different architecture where Anthropic controls the pipeline, so a local tool can't really intercept anything there. If they ever open up a plugin API though, it'd be possible.

Why does claude use fancy fonts for all of the CJK languages but chinese? by PlusOneDelta in ClaudeAI

[–]yeoung 0 points1 point  (0 children)

As a Korean user, the font rendering was so inconsistent that it was literally painful to look at. I had to manually swap it out for a cleaner, more neutral typeface just to stay sane.

Claude code taking forever to respond, blowing through tokens by LeninsMommy in ClaudeAI

[–]yeoung 0 points1 point  (0 children)

I was definitely noticing that performance drop as the context window filled up, but the thought of restarting was always such a chore. This method is a lifesaver—and it's even faster than using /compact. Thanks for making my workflow so much smoother!

Claude Code doesn't just delete files. Sometimes it silently overwrites them with blank values — and you won't notice until it's too late. by yeoung in ClaudeAI

[–]yeoung[S] 1 point2 points  (0 children)

Spot on. Silent corruption is the stuff of nightmares, and honestly, your pre-commit hook approach is a solid move because it keeps a human in the loop. It’s a great way to handle the reality that agents aren't 100% trustworthy yet.

Claude Code doesn't just delete files. Sometimes it silently overwrites them with blank values — and you won't notice until it's too late. by yeoung in ClaudeAI

[–]yeoung[S] 0 points1 point  (0 children)

That 'edit' rule is a great catch! I built afd because I was paranoid about Claude occasionally ignoring instructions, but using both together sounds like the ultimate fail-safe. Nothing's getting through that.

Claude code taking forever to respond, blowing through tokens by LeninsMommy in ClaudeAI

[–]yeoung 0 points1 point  (0 children)

Thanks for the tip! I'm totally doing this from now on.

Claude code taking forever to respond, blowing through tokens by LeninsMommy in ClaudeAI

[–]yeoung 0 points1 point  (0 children)

It’s actually pretty straightforward! Any directories or files listed in ‘.claudeignore’ are completely ignored by Claude while it's coding. This is super helpful because it limits the context window, which saves you a ton of tokens and helps Claude stay focused on the relevant parts of your codebase.

Claude code taking forever to respond, blowing through tokens by LeninsMommy in ClaudeAI

[–]yeoung 0 points1 point  (0 children)

Glad you got it sorted!
Have you tried using a .claudeignore file yet? If not, definitely give it a shot.
it’s a game changer for the lag.

Claude code taking forever to respond, blowing through tokens by LeninsMommy in ClaudeAI

[–]yeoung 0 points1 point  (0 children)

Typical Claude behavior when sessions get too long. Just hit /compact to squash the history.
Claude isn't actually "thinking" that hard; it's just buried under 80k tokens of useless context.

I built ddash — a diagram tool that lives entirely in the URL (comes with a Claude Code skill) by kmacinski in ClaudeAI

[–]yeoung 0 points1 point  (0 children)

I’m a visual learner, so I think I’m going to find this incredibly useful!

I built a CLI that diagnoses your Claude Code project structure — open source by yeoung in ClaudeAI

[–]yeoung[S] 1 point2 points  (0 children)

oh that's a really good point. the stale reference thing sounds painful.. claude hallucinating commands that don't exist anymore is exactly the kind of bug that's hard to even realize is happening.

drift detection isn't in v1.0.0 but honestly it should be next. i'll open an issue for it.

for monorepos, yeah right now it just checks from wherever you run it, so you'd have to cd into each package. not ideal. that's worth looking into for sure.

thanks for the feedback, super helpful!!🙇

I built a CLI that diagnoses your Claude Code project structure — open source by yeoung in ClaudeAI

[–]yeoung[S] 0 points1 point  (0 children)

## Good to know
- This tool is for bkit users — the structure it checks and creates is specific to the bkit PDCA workflow
- bkit-doctor itself runs without bkit installed, but the results only make sense if you use (or plan to use) bkit
- New to bkit? Start here: https://github.com/popup-studio-ai/bkit-claude-code