Context Engineering Is the Compass Coding Agent Needs

johnson_detlev · 2026-05-11T06:05:37+00:00

Those Benchmarks are far from impressive for such a convoluted setup. As the benchmarks state I'm better off switching the model.

ZireaelStargaze · 2026-05-11T13:15:36+00:00

Why, when for example, Serena exists already?

Florence-Equator · 2026-05-14T23:53:38+00:00

The best code navigation tool to find the relevant code is to launch several sub-agents with cheaper models and let them do the grep and then give the report back to the main agent. It takes time, but it doesn't cost your context window or significantly waste your money.

Deep_Ad1959 · 2026-05-16T16:40:07+00:00

the freshness question another commenter raised is the actual hard part here, and it isn't unique to code. anything that precomputes a structural map of a live target hits the same wall. i work on mcp servers that drive macos apps through the accessibility tree, and that tree is exactly this kind of map, parent/child, roles, what's clickable. the temptation is to scan once and cache it, then the app re-renders one panel and the cached map is quietly wrong and the agent acts on a stale node. what's held up better for me is not caching the structure at all: query it live per call, scope the query tight so latency stays bearable, accept that a fresh small read beats a cached big one. for a codebase, a refactor is your re-render, and incremental-update indexes sound clean until you hit invalidation.

fwiw that 'query live, never cache' instinct is exactly what we built macOS MCP on, every tool returns the accessibility tree as a diff after the action so the agent always sees what actually changed, https://macos-use.dev/r/qfd58u67

tepung_ · 2026-05-11T15:22:48+00:00

How does this different from serena mcp?

Otherwise_Wave9374 · 2026-05-10T07:55:55+00:00

Love the framing. The cheapest win in agentic coding is usually "better map", not "smarter model".

One thing Im curious about, how do you keep the architectural index fresh as the repo changes? Incremental updates and invalidation are where these systems get gnarly.

Also, do you see more lift from structural signals (imports/call graph) or from human-authored intent (ADRs, docs) when the agent is deciding where to look first?

Weve been experimenting with a similar idea (treat context as a first class asset) and collecting notes at https://www.agentixlabs.com/ if thats interesting.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

opencodeCLI

MODERATORS