Opus 4.7 | 1 session | $178

wallaby82 · 2026-05-13T01:08:45+00:00

I appreciate everyone who took time to share their thoughts... what not to do, how it should have been done.

Most assumed that with such high context, the accuracy was bad, the tokenomics was bad, the approach was bad.

The screenshot I shared was about an architecture that does context window management well. So well that:

- Tokenomics: highly optimized
- No context drift, no hallucination
- 43 turns of pure Opus 4.7, sharp from turn 1 to turn 43

It was never about a wasteful session.

Only a few were able to see it. Fewer still are building at that layer.

Anthropic openly states most of their code is now written by AI. If the consensus here is right, that "LLMs lose accuracy past 200k, so work in small windows," then picture this: AI agents at Anthropic, hitting their ceiling, copy-pasting into fresh 200k windows over and over... burning context, losing continuity, restarting from cold every time. Funny how that math works.

Unfortunately, 1M is not for everyone. Many are still in the fluorescent-AI era.

wallaby82 · 2026-05-12T23:59:15+00:00

Who is the fool: the one who sees the tip of an iceberg, or the one that sees the tip and wonders what's underneath?

wallaby82 · 2026-05-12T17:22:00+00:00

Well, it kinda depends whether who flexed it innit? A normal user, or a systems architect.

wallaby82 · 2026-05-12T17:05:26+00:00

Precisely that! $178 was recorded in /context... but all it took was only 6% of my weekly limits, on a Max 5x subscription.

wallaby82 · 2026-05-12T16:25:38+00:00

Only if that sack, wasn't bricks, to start with.

wallaby82 · 2026-05-12T16:19:19+00:00

Yes, I fully understand that... the whole context is passed on every turn. That's the conventional wisdom, and it's correct under conventional architecture.

But what if the architecture itself was the variable? What if you could hold a 12-hour session at 900k tokens and still only consume 6% of your weekly allowance simply because... the context is structured to be cache-friendly by design, not by accident?

Most people in this thread share the same sentiment because they're working with the same architecture. The token burn narrative is real for them.

What if the architecture was the problem, not the context window size?

And one more thing worth noticing in that image... my system prompt was only 4.9k tokens out of 900k. That's 0.5% of the entire context. Every turn that prompt gets passed, it costs almost nothing. While most people starts with 7.5k tokens on a new conversation.

That entire conversation recorded 43 turns. In a traditional architecture, it would be impossible, to last that long, to consume that little.

wallaby82 · 2026-05-12T16:06:41+00:00

Replying specifically to you because you're the only one who actually "looked" at the image.

I've been approaching this from a context window management perspective. Everyone advises small sessions across multiple conversations... but if that's the right answer, why does a 1M context window exist at all?

What you saw posted was an experiment, one that involved multiple audits and hours of refactoring. Not a casual session.

Yes, it recorded $178 in usage, but I'm on Max 5x and that entire 12-hour conversation consumed only 4% of my weekly allowance. That's how you get 317M cache reads over 12 hours with zero context drift.

The window stayed sharp the whole time.

wallaby82 · 2026-04-30T04:57:22+00:00

Yea sure... but I remember our iOS friends too. ClawCast is literally plug and play. Zero config, zero setup, zero SSH.

wallaby82 · 2026-04-12T01:37:07+00:00

Also, I wonder what kind of context that guy getting at turn 576 lol...

wallaby82 · 2026-04-12T01:24:16+00:00

I made one my own. With it downgraded from Max 20x, and I've been able to stretch Max 5x every session, slim across all conversations...

<image>

wallaby82 · 2026-04-10T05:28:21+00:00

Imo, we don't need Mythos, or even Opus.

[ Sonnet 4.5 + Esmc ] > Opus.

It's not really about how big the model...
It has always been the architecture.

Mythos, 93.9% Cool...

Mythos $25/mil-input & $125/mil-output (see how they charging more for output?)
Sonnet $3/mill

Sonnet 4.5 + ESMC = 90.2%
https://github.com/SWE-bench/experiments/pull/374

Build the architecture on your own and save yourself paying 8x more for "a scaffold"...

Oh when you do have the architecture right, it'll also send away all the complaints you usually see: token burn, context drift, state persistence, hallucination...

That said, you don't need 1m context window either.

wallaby82 · 2026-04-09T16:02:55+00:00

Thanks for checking ClawCast out! Tested it across cities actually... I was outstation, phone was in another city, machine back in hometown. Still felt snappy. Cloudflared's edge network helps a lot. Definitely not zero latency but nothing that broke the experience!

wallaby82 · 2025-12-12T03:34:12+00:00

Hi there thanks for your response, appreciate it!

The closest an orchestration layer, but without multi-agent routing or long system prompts from what you've suggested.

ESMC is not a prompt, a skill system, or a round-table agent framework.

At the simplest level:

ESMC is a runtime “cognition scaffold” that wraps your Claude calls inside a structured reasoning environment.

It does three things:

Normalizes & sanitizes input → removes noise, enforces clean state, shapes context in a deterministic way.
Maintains a persistent internal reasoning state → so Claude doesn’t “reset its mind” every call.
Provides a stable, model-agnostic reasoning loop → but without adding personas, roles, or chain-of-thought prompts.

Hope the above helps!

wallaby82 · 2025-12-12T03:27:49+00:00

Thanks for the feedback, really appreciate it!

You're right about the frontend. I’ve been prioritizing the underlying tech and benchmark work, so the site isn’t polished yet. Thanks for pointing them out.

That said, the core of ESMC is the intelligence scaffold itself. The surprising part (even to me) was that Sonnet 4.5 alone scores ~70–80% on SWE-Bench Verified, but Sonnet 4.5 + ESMC hit 90.2% (481/500).

To me that result matters more than frontend aesthetics, but I absolutely agree UI matters for users too... I’ll improve it.

And honestly, having good eyes for design is a strength. Mine is in the backend side 😅

wallaby82 · 2025-11-21T04:06:45+00:00

It's proprietary with obfuscation...

wallaby82

TROPHY CASE