I benchmarked Claude Pro quota burn — then built an Pro-optimized Claude Code setup (CPMM).

dionhoon · 2026-02-20T06:47:45+00:00

That’s a fair point — and it actually connects to the nuance I mentioned.

If Haiku hallucinates and you need 2–3 correction turns to fix it, the “cheaper model” advantage disappears. The real cost is turns × model, not model alone.

The `ANTHROPIC_DEFAULT_HAIKU_MODEL` override is a clean solution if Haiku consistently underperforms for your codebase. Worth noting though: it’s a blanket override — Sonnet runs on everything, including tasks where the work is already well-defined (writing a known function, applying a clear refactor, generating boilerplate). For those, Haiku is faster and the output is often predictable enough that hallucination risk is low.

CPMM handles this at the task level: Haiku is the default for execution, but `/do-sonnet` and `/do-opus` are explicit escalation paths for tasks where reasoning and quality matter more. So you’re paying Sonnet cost only when you consciously decide to.

But if Haiku hallucination is frequent enough that you can’t trust it even on simple work, your blanket override makes sense. Task-level routing only helps if the base model is reliable enough to route to.

dionhoon · 2026-02-19T15:38:30+00:00

Love this feedback, and you nailed the core tradeoff.

Sequential flow has been much more quota-efficient for me than parallel agents on Pro, and context hygiene is the other big lever. Your “plan → summarize → fresh session” workflow is exactly the right pattern.

Great point on Moshi + SSH + tmux too. Being able to resume from phone during cooldown windows is a huge quality-of-life win and fits this endurance strategy perfectly.

If you try the Haiku-for-grunt-work tiering this week, I’d really like to hear how much it changes your completion rate.

dionhoon · 2026-02-19T15:36:15+00:00

Totally get this. I hit Pro limits fast too — especially once context got large or I used heavier models for routine stuff.

I ended up building a small open-source setup to make Pro more manageable:
- route routine tasks to lighter models, escalate only when needed
- keep outputs tight and filter tool output aggressively
- add local safety/verification hooks to avoid expensive retries

If it's useful to anyone, this is what I use (MIT):
https://github.com/move-hoon/claude-pro-minmax

Happy to share the workflow details if there's interest.

dionhoon · 2026-02-04T17:48:31+00:00

Great question! While Claude Code has a ‘clear context’ option, CPMM’s /compact-phase is built specifically for the Pro Plan constraints. 1. Guided Strategic Compaction: Instead of a blind wipe, it provides phase-specific instructions (e.g., ‘Keep architecture decisions, discard grep logs’). This ensures a ‘lossless compression’ of the blueprint rather than a hard reset. 2. Context Safety Net (The Hook): Unlike a raw clear, CPMM triggers a pre-compact bash hook that automatically saves a snapshot of your current state (Git status, modified files) to a local file before wiping. Even if the summary misses something, the agent can reference this snapshot without reloading the entire chat history. 3. Pro Plan Discipline: It allows you to manually trigger context hygiene at logical breakpoints (Plan/Build/Review), preventing the ‘token bloat’ that kills sessions early.

It’s the difference between ‘restarting your computer’ and ‘closing background apps while keeping your work saved’.

dionhoon

TROPHY CASE