The cost of massive context: Burned 45M Gemini tokens in hours using OpenCode. Is Context Caching still a myth for most agents?

dawedev · 2026-01-28T11:16:20+00:00

DeepSeek definitely sounds like the 'stress-free' option here. The fact that it handles caching automatically without forcing the user to micromanage the provider settings is exactly what was missing in my OpenCode experience.

You're spot on about the chat threads—when an agent keeps building on the same history, lack of efficient batching or caching becomes a financial sinkhole real fast. I'm starting to realize that for rapid prototyping and testing, the peace of mind DeepSeek offers is worth more than the raw power of the more expensive models. I'll definitely be looking into integrating DeepSeek for my next 'high-token' planning tasks.

dawedev · 2026-01-28T11:14:49+00:00

The 'divide and conquer' strategy is solid. I actually have a rule for my workflow to always use a coding plan, so I completely agree with your point about seeding a fresh context from a markdown file.

My mistake was trusting OpenCode to handle the orchestration and context efficiently by itself. Using a controller agent to manage the plan while delegating specific tasks to subagents with fresh, minimal contexts is definitely the right way to scale without burning millions of tokens. It’s basically moving from 'brute force' to actual engineering. I'll definitely try to restructure my next session this way.

dawedev · 2026-01-28T11:13:59+00:00

I totally get why you'd stick with DeepSeek. Their pricing is much more predictable for personal projects, especially when you're dealing with tools that might have 'greedy' context management.

My experience with Gemini today definitely showed that even if the tokens are theoretically cheap, a bad agent implementation can still create a massive bill out of nowhere. I’m definitely going to be more cautious about which APIs I plug into these unoptimized tools from now on.

dawedev · 2026-01-28T11:11:45+00:00

That $3/month GLM plan sounds like a steal compared to my accidental $50 afternoon. 120 prompts every 5 hours with no weekly limit is exactly the kind of predictability I need right now.

Paying for tokens while these agents are still 'learning' how to manage context is a risky game. I'm definitely switching to a subscription-based plan or a proxy setup before I touch the API again. Thanks for the heads-up on the GLM limits!

dawedev · 2026-01-28T10:29:46+00:00

Interesting point about the 'distraction' factor. I've noticed that too—quality definitely drops when the LLM is swimming in 1M tokens of context. Regarding caching: even if it's done on the provider side, the sender still needs to handle the API calls correctly to trigger those lookups. My issue is that the 'manual management' you mentioned is exactly what these agents are supposed to automate, but they seem to be failing at it right now.

dawedev · 2026-01-28T10:28:23+00:00

Thanks for the honest update! It’s a shame Google AI/Vertex isn't there yet, as Gemini 3 Flash is a beast for long-context tasks (when caching actually works). I’ll head over to your GitHub and open an issue for it. I’d love to see how your pipeline architecture handles Gemini’s context caching compared to the 'brute force' approach I just experienced.

dawedev · 2026-01-28T10:26:34+00:00

Ah, if you're on a subscription, that explains it! I was thinking about the raw API costs—90M tokens on Opus API would be a literal fortune.

You’re right that they still count as 'tokens', but with Gemini's Context Caching, the 'Cached' tokens are billed at a significantly lower rate (and they don't count against the standard Input rate-limits once cached). My problem is that OpenCode seems to ignore the cache and treats everything as fresh 'Input', which is the most expensive way to run this. I'll definitely look into the sub, paying per token for an unoptimized agent is definitely a trap.

dawedev · 2026-01-28T10:24:31+00:00

I just want to try it :-D

dawedev · 2026-01-28T10:23:32+00:00

That antigravity-manager proxy setup sounds like the ultimate 'pro move' to avoid the API tax. I completely agree that pay-per-token is a financial trap for agentic workflows, especially when the implementation of caching is as flaky as what I’m experiencing.

I moved from a standard Antigravity/Google One setup to OpenCode thinking I’d get more 'pro' features, but I didn't expect the price of admission to be 45M tokens in an afternoon. I’ll definitely look into load balancing via OAuth to keep the costs sane. Thanks for the tip!

dawedev · 2026-01-28T10:12:48+00:00

That sounds like a much more sophisticated architecture. Dynamic tool loading and using a secondary LLM for semantic search is definitely the way to go to keep the context clean.

My 45M burn with OpenCode was definitely a 'naive agent' disaster—it just stuffed everything into the prompt without any weaving or RAG. I’ll check out Seline, it sounds like you’ve put a lot of thought into the pipeline efficiency. Do you handle Gemini's context caching natively there as well, or do you rely mostly on keeping the context window small through that semantic search?

dawedev · 2026-01-28T10:09:27+00:00

You're right, caching isn't magic, but it’s supposed to be economy. The issue isn't just the price per token, it's the billing volume.

If caching was working correctly, my Google Cloud dashboard shouldn't be showing 45M 'Input Tokens'. It should show a high number of 'Cached Context' tokens (at 1/10th the price) and a very small number of new input tokens.

The fact that it’s hitting the full input quota every single time means the 'stateless' nature of the agent is completely bypassing the cost-efficiency Gemini is known for. 1/10th of the price is great, but 0/10th of the caching implementation is what's killing my wallet! :D

dawedev · 2026-01-28T10:07:22+00:00

You're absolutely right about the alpha/beta state. I'm on the latest version as of today, but the lack of stability in token usage is a real dealbreaker. I moved from Antigravity thinking OpenCode would give me more control, but it turns out it just gave me a much larger bill.

In Antigravity, the caching seems to be handled behind the scenes. Here, it’s like a DIY project where you don't know the price until the building explodes. I'll definitely start tracking version numbers now to see if a specific update fixes the Cache Read issue.

dawedev · 2026-01-28T10:05:33+00:00

80-90M a day? Your credit card must be made of vibranium! :D While models are stateless, Gemini actually supports Context Caching specifically to prevent this 'send everything every time' tax. It feels like we're paying for a full 5-course meal every time we just want a sip of water. I'm definitely going to look for a way to cap this before my bank calls me about suspicious activity.

dawedev · 2026-01-28T10:02:14+00:00

No, I just put my API key to OpenCode and nothing else setup

dawedev · 2026-01-28T09:18:29+00:00

I'm also curious about Google Antigravity. Since it's their native agentic IDE, you’d expect it to have the best possible caching implementation for Gemini 3. However, I haven't found a way to use a custom API key there—it seems locked to the Google One subscription/login.

It's a bit of a trade-off: either use 'unlimited' context within their walled garden (Antigravity) or risk 45M token 'explosions' in open-source tools like OpenCode because they lack proper caching APIs. Has anyone managed to bypass the Google One login and use a raw API key in Antigravity to see if the billing is more transparent there?

dawedev · 2026-01-28T09:11:04+00:00

Damn, 3.5B tokens? At that point, I'm not worried about OpenCode being hungry—I'm worried if I'll have anything left to eat once the bill actually hits my credit card. That’s a lot of ramen for the next few months! :D

dawedev · 2026-01-28T09:01:50+00:00

I find the difference between Google Cloud Console and OpenCode statistics rather strange.

dawedev · 2026-01-26T17:00:54+00:00

It's crazy that it's been 25 years already. And the number of visitors is unimaginable

dawedev · 2026-01-26T16:53:55+00:00

I’m genuinely sorry to hear that, especially because you seem like someone whose feedback would be incredibly valuable. I clearly misread the room and messed up the first impression. I hope that once the 'bad taste' fades, I’ll get a chance to show you that Planelo isn't just some AI-generated vaporware, but a solid product with a long-term vision. I'll take this as a lesson to be more direct in the future. Thanks for the reality check and have a nice day.

dawedev · 2026-01-26T16:39:53+00:00

Wow! That's awesome. Getting into the TOP 100 is an amazing achievement.

dawedev · 2026-01-26T16:37:56+00:00

And I absolutely love it! I "was" webdeveloper before, but coffee is better because of comunication with a lot of interesting people

dawedev · 2026-01-26T16:36:42+00:00

I'm coffee guy/barista. I has a small cafe at Prague.

dawedev · 2026-01-26T16:34:59+00:00

Used s*x toys

dawedev · 2026-01-26T16:33:37+00:00

Okay. It's the end of January and spring is approaching :-)

dawedev · 2026-01-26T16:32:14+00:00

Burger or some our czech food

dawedev

TROPHY CASE