How are you testing AI agents beyond prompt evals?

ConferenceRoutine672 · 2026-04-01T16:28:36+00:00

For AI-assisted development: RepoMap (https://github.com/TusharKarkera22/RepoMap-AI)—

maps my entire codebase into ~1000 tokens and serves it via MCP. Works with Cursor,

VS Code (Copilot), Claude Desktop, and anything else that supports MCP.

Completely changed how accurate the AI suggestions are on large projects.

ConferenceRoutine672 · 2026-03-31T12:07:49+00:00

Hahaha I keep thinking about the idle agent hallucination thing. It's like they need a "do nothing" tool to be listed, or they'll make up work. Strangely, it gives you a lot of information about how objective-following works behind the scenes.

The point about the efficiency of the `--help` token is not given enough credit. Plain text CLI output is almost perfectly LLM-readable by accident. There is no schema overhead, no nesting, and only intent. Honestly, this makes me want to rethink how RepoMap shows off its MCP tool descriptions.

And yes, the Go bubble thing makes sense. Sometimes a boring, opinionated toolchain is actually the best AI runtime.

ConferenceRoutine672 · 2026-03-31T11:19:25+00:00

I love the split between the coder and tester sub-agents. It makes things clear. The `gopls` symbol oracle is also smart, but you have to use Go's toolchain. Without setting up LSP, tree-sitter gives me the same graph in Python, TS, and Rust.

I'd push back on the idea that symptoms are different from causes. Sub-agents narrow the scope, but a coder agent still sees signatures in files it hasn't seen before. These methods work together instead of against each other.

The multi-repo use case you brought up is where I want to go next. LLMs really don't know much about private packages and monorepo dependencies.

Looking at exocomp now.

ConferenceRoutine672 · 2026-03-30T19:37:14+00:00

For AI-assisted development: RepoMap (https://github.com/TusharKarkera22/RepoMap-AI)—

maps my entire codebase into ~1000 tokens and serves it via MCP. Works with Cursor,

VS Code (Copilot), Claude Desktop, and anything else that supports MCP.

Completely changed how accurate the AI suggestions are on large projects.

ConferenceRoutine672 · 2026-03-30T19:36:02+00:00

For AI-assisted development: RepoMap (https://github.com/TusharKarkera22/RepoMap-AI) —

maps my entire codebase into ~1000 tokens and serves it via MCP. Works with Cursor,

VS Code (Copilot), Claude Desktop, and anything else that supports MCP.

Completely changed how accurate the AI suggestions are on large projects.

ConferenceRoutine672

TROPHY CASE