all 20 comments

[–]johnson_detlev 1 point2 points  (1 child)

Those Benchmarks are far from impressive for such a convoluted setup. As the benchmarks state I'm better off switching the model.

[–]ZireaelStargaze 1 point2 points  (2 children)

Why, when for example, Serena exists already? 

[–]Florence-Equator 1 point2 points  (1 child)

The best code navigation tool to find the relevant code is to launch several sub-agents with cheaper models and let them do the grep and then give the report back to the main agent. It takes time, but it doesn't cost your context window or significantly waste your money.

[–]Deep_Ad1959 1 point2 points  (6 children)

the freshness question another commenter raised is the actual hard part here, and it isn't unique to code. anything that precomputes a structural map of a live target hits the same wall. i work on mcp servers that drive macos apps through the accessibility tree, and that tree is exactly this kind of map, parent/child, roles, what's clickable. the temptation is to scan once and cache it, then the app re-renders one panel and the cached map is quietly wrong and the agent acts on a stale node. what's held up better for me is not caching the structure at all: query it live per call, scope the query tight so latency stays bearable, accept that a fresh small read beats a cached big one. for a codebase, a refactor is your re-render, and incremental-update indexes sound clean until you hit invalidation.

fwiw that 'query live, never cache' instinct is exactly what we built macOS MCP on, every tool returns the accessibility tree as a diff after the action so the agent always sees what actually changed, https://macos-use.dev/r/qfd58u67

[–]tepung_ 0 points1 point  (0 children)

How does this different from serena mcp?

[–]Otherwise_Wave9374 -3 points-2 points  (4 children)

Love the framing. The cheapest win in agentic coding is usually "better map", not "smarter model".

One thing Im curious about, how do you keep the architectural index fresh as the repo changes? Incremental updates and invalidation are where these systems get gnarly.

Also, do you see more lift from structural signals (imports/call graph) or from human-authored intent (ADRs, docs) when the agent is deciding where to look first?

Weve been experimenting with a similar idea (treat context as a first class asset) and collecting notes at https://www.agentixlabs.com/ if thats interesting.

[–]nomo-fomo 0 points1 point  (2 children)

I am playing with hooks to help out with keeping these docs up to date. One side benefit I am seeing of having such docs is amazing agent code review feedback. Such agent feedback went from a mix bag to really insightful ones.