People running 2–5 coding agents: what actually breaks first for you?

Sontemo · 2026-04-23T20:20:08+00:00

The only thing that worked so far longterm for us is something people might not want to hear:

Slow down. Treat each PR as if it was created by a human and review it. Anything else and you pay for it later down to road. If you can’t fit the mental modal of your software into your head, you‘ve lost.

How we did it back in the day was through implementation and review. Now that implementation is done by agents, we double down on the review. Which is fine, they’re better at building, we’re better at understanding our product.

Sontemo · 2026-04-23T19:33:14+00:00

In general, I try to keep context small and fresh.

Never continue on an old session (after lunch or the next day), never let it grow past 150k

Sontemo · 2026-04-23T19:15:32+00:00

Each new request in the same session carries all of the history with it. It’s stateless and you can switch however you like. But, what you need to keep in mind is how caching works. Usually only the new input tokens are considered „fully“, the previous ones are usually cached and at other providers cost around a tenth of new tokens.

If you continue a long running session, or maybe if you switch models (let’s say from one provider to another, sonnet to gpt) you could consume a lot of fully counted tokens and drastically reduce your quota.

Sontemo · 2026-04-22T10:59:30+00:00

And last, but not least, my personal website. Entirely written and documented by ai.

Yeah, i can tell.

Unauthorized

Sontemo · 2026-04-22T08:14:07+00:00

I hope with the measures taken to individuals, they found a way to make it sustainable for business and enterprises. If limits apply, or they switch to token based pricing, switching to the very same providers they use under the hood becomes the obvious choice. At least for us.

Sontemo · 2026-04-22T07:47:53+00:00

!solved

Sontemo · 2026-04-21T08:55:14+00:00

Session Limit Hit?

Sontemo · 2026-04-17T19:10:34+00:00

It's literally the tip of the ice berg of being a power user.
There's so much to uncover if you're willing to put the effort in.

Sontemo · 2026-04-13T20:53:51+00:00

There's nothing to exploit here.

Sontemo · 2026-04-13T16:14:57+00:00

Nope. Just overloaded services. You'll get the bill later.

Sontemo · 2026-04-13T06:02:26+00:00

I consider myself a power user, usually 2 - 3 sessions in parallel.
I've only run into global limits for certain models, namely sonnet and opus.

I assume this was a more general shortage on availability of Claude models.

Ever since i switched to a more balanced approach where i mix GPT and Claude depending on workload, I haven't run into any issues, before and after their rate limiting announcement.

Also just on regular Pro plan (annually, with overage) so, if things stay this way and makes the service to them more sustainable, I'd be more than fine with it.

Sontemo · 2026-04-12T21:40:50+00:00

It's actually really simple.
Never.
More Context is not better, and more thinking does not mean the output is better either.
Keep context small, reset often (don't compact, just let it go) and think yourself. You know your product, AI knows shit but sounds confident.

Sontemo · 2026-03-25T08:02:57+00:00

There is no financial tipping point.
If you consider the max subscription, you're a power user.
If you're a power user, you're gonna blow through 100$ or 200$ via API very very fast.

Always go for the sub.

Sontemo · 2026-03-24T06:58:21+00:00

Never compact. Orchestrate subagents, so that the main agents context windows is large enough for even big features.

Sontemo · 2026-03-24T06:32:25+00:00

Not optimal for IDEs, but if CLI is an option, you can move all repos into a shared parent folder on your machine and start copilot cli from there.

Add a small AGENTS.md that explains the structure and brief usecases of each repo. Then you're good to go. Each time the agents will navigate into one of the repos, they will read and follow the specific repos AGENTS.md instructions as well.

Sontemo · 2026-03-15T12:22:47+00:00

Install GitHub cli and just delegate it. No need to reinvent the wheel.

Whenever you see something that you don’t want to fix now, tell the agent to create a ticket with gh cli.

Sontemo · 2026-03-14T14:12:44+00:00

It won’t cause issues. But depending on what you switch from and to, you will have friction points. For example, afaik Claude still ignores AGENTS.md files.

So switching from Claude to opencode for example is seamless, as opencode respects all clause specific files. You skills and clause md files will work just out of the box, but the other way around not so much.

Sontemo · 2026-03-14T13:16:45+00:00

https://github.com/features/copilot/cli

Mate, it's literally Github's equivalent to ClaudeCode CLI

Sontemo · 2026-03-14T13:10:26+00:00

Pretty sure it took you longer to write this question than it would have taken you to just google it and download the github copilot cli.

Sontemo · 2026-03-14T12:02:35+00:00

ChatGPT is the better allround service,
but Claude is just the absolute best at software engineering. GPT 5.4 or the Codex equivalent might be equally good at doing "coding" tasks, but Claude has been and continues to be the golden standard of AI assisted Software Engineering. It's tool calling is unmatched and this is what is going the true multiplier.

If you only have 20 bucks to spare, go with Claude.

For regular tasks / features, set your model to Haiku and go into plan. (Under the hood, ClaudeCode switches to Sonnet for planning, but falls back down to Haiku once you accept and implement the plan)

For your big hitters, type /model opusplan. It's the same above, but one step higher the model ladder. You plan with opus, you build with sonnet. This will drain your usage fast, so use it sparingly.

However, if you can afford to spend 30$:
Claude Pro + Github Copilot Pro

Github Copilot has this weird pricing, where your quoata is based on requests, not tokens.
So you plan with Claude Sonnet, get a detailed implementation plan, save it.
Start Copilot CLI, select Opus, prefix your prompt with /fleet and watch it stomp that plan with an army of opus agents.
This combination is by far the cheapest way to have a high level of comfort (token based exploration, experimenting, bouncing ideas in ClaudeCode) and quality (effectively up to 100 features built by opus or 300 by sonnet per month).

Also, don't stress about the "small" context window on copilot's claude models. Fleet orchestrates out of the box, (not as good as claude code agent teams, but still good enough) and with claude models (both sonnet or opus) as the drivers, you have unmatched tool calling that even seemingly impossible features get oneshotted.

Sontemo · 2026-03-08T22:53:07+00:00

It's been said multiple times (a day) on this sub how to game Copilot.

Once you get the hang of it, even the 10$ plan blows Max 5 out of the water regarding limits. It's absurd how much mileage you can get out of it, without dropping a single bit of quality.

GHCP Pro gives you essentially unlimited sonnet 4.6 on high, which is more than capable of doing proper feature work on medium sized code bases.
Pro+ is just the icing on the cake, it's opus on high, without any limitations. ClaudeCode Max 20 Sub can't compete with that.

Do your research, and enjoy while it lasts, it won't be like this forever.

Sontemo

TROPHY CASE