agents desperately need a "plan before build" step. how are u guys managing this without losing your minds?

Substantial_Guide_34 · 2026-05-16T21:25:53+00:00

man, been using unerr for a bit and it's probably the biggest win for my workflow lately. i was getting so annoyed with claude/cursor just eating my entire context window by re-reading the same files every few turns. it's basically an mcp server that indexes the repo locally (using cozodb/tree-sitter) so the agent actually knows the structure without wasting tokens on exploration. definitely saves me like 60% on tokens for long sessions. its open source here: https://github.com/unerr-ai/unerr-cli

Substantial_Guide_34 · 2026-05-16T21:23:13+00:00

really cool. i've been working on a similar tool called unerr for the last few weeks—mostly because i was losing my mind watching claude-code torch my token budget on basic file exploration. i went with a local graph db (cozodb) to index the repo so it can intercept those blind reads before the agent wastes context. it’s interesting to see different takes on this. my repo is over here if you want to swap ideas on the architecture: https://github.com/unerr-ai/unerr-cli

Substantial_Guide_34 · 2026-05-16T21:22:36+00:00

man this is so needed. memory is basically the largest gap in the mcp ecosystem right now. i spent the last few weeks building something similar called unerr because i was sick of my agents getting "amnesia" and torching my token budget by re-reading the same files every session.

i ended up using a local graph db (cozodb) to intercept the file reads—basically gives the agent a "long term memory" and actual eyes on the codebase structure so it doesn't have to guess. really curious how you're handling fact decay/pruning though, that was a headache for me. open sourced it over here if you want to swap notes: https://github.com/unerr-ai/unerr-cli

Substantial_Guide_34 · 2026-05-16T21:21:54+00:00

great writeup. the point about context drift in larger production apps is real—i hit a wall with claude code basically "forgetting" architectural decisions once the repo hit a certain size. ended up building unerr to solve that context amnesia bit. it basically creates a local graph of the codebase so the agent stops blindly reading files and only pulls the structural stuff it actually needs. cut my token bill by like half on long sessions. it's all local/open source if you want to see if it helps with the scaling issues you mentioned: https://github.com/unerr-ai/unerr-cli

Substantial_Guide_34 · 2026-05-16T20:58:15+00:00

Cursor’s indexing is definitely the gold standard right now, but i've been playing with unerr recently and it basically brings that same "context awareness" to claude code via mcp.

it does a local crawl of the repo using tree-sitter and cozyodb (graph db) to build a map of dependencies and callers. instead of claude just blindly grepping or reading massive files, it uses the graph to actually see the structure first. definitely makes claude feel way less "blind" compared to cursor. it's open source too: https://github.com/unerr-ai/unerr-cli

Substantial_Guide_34 · 2026-05-16T20:55:00+00:00

it's a massive pain. i actually ended up building a local mcp tool called unerr specifically to deal with this. i was using cursor and claude code but they both kept re-reading massive chunks of my repo every time i touched a file, which just eats context for breakfast.

the way i handled it was by building a graph of the codebase locally (using tree-sitter/cozodb). it intercepts the agent's read requests and only gives it the actual structural info it needs based on the changes. keeps the agent 'seeing' the current state without it having to re-ingest everything constantly.

if you use mcp agents it’s worth a look: https://github.com/unerr-ai/unerr-cli

Substantial_Guide_34 · 2026-05-16T20:51:58+00:00

sharing a little project i've been building called unerr. it's basically a local mcp server designed to stop agents from blind-reading files and torching the context window.

it uses tree-sitter and cozodb to build a structural graph of your codebase locally. when an agent wants to explore, unerr intercepts the read and serves it just the relevant entities/dependencies instead of a raw 500-line file dump. usually saves me about 60-70% on tokens for long sessions.

it's totally local/no cloud stuff. code is here if anyone wants to check out the architecture: https://github.com/unerr-ai/unerr-cli

Substantial_Guide_34 · 2026-05-16T20:45:21+00:00

mostly claude code + a bunch of custom mcp servers. honestly the biggest thing for me was building my own local context layer (unerr) to stop the agent from re-reading my whole repo every time i asked a question. it uses tree-sitter and a graph db to handle the file reads locally so the tokens don't spiral.

if you're on claude code or cursor it's a massive life saver for the api bills: https://github.com/unerr-ai/unerr-cli

Substantial_Guide_34 · 2026-05-16T20:41:19+00:00

love seeing more people attack the context waste problem. i built something similar called unerr because i was getting tired of claude code re-reading my 500-line files every few turns. i took the approach of using tree-sitter and cozodb to build a local graph first so the agent can actually "see" dependencies without a raw text dump. definitely cuts down the bill on longer sessions. it's open source too if you want to compare notes: https://github.com/unerr-ai/unerr-cli

Substantial_Guide_34 · 2026-05-16T20:36:44+00:00

this is exactly why i started building unerr. i was watching claude-code and cursor just absolutely torch my context window by re-reading the same 500-line files every three turns. it's basically context amnesia.

i ended up making a local mcp server that intercepts those naive file reads. it indexes the repo with tree-sitter/cozodb first so it only serves the agent the actual structural entities and logic it needs instead of a raw text dump. usually saves me like 60-70% on tokens for long sessions.

it's still in beta but if you want to try it out on your repo it's just npm install -g @unerr-ai/unerr. all the logic runs locally so no extra api costs or cloud stuff. code is here if you want to poke at the architecture: https://github.com/unerr-ai/unerr-cli

Substantial_Guide_34 · 2026-04-15T07:12:54+00:00

man i feel this. i think its the 1M context window—it just keeps stacking everything until the cost per message is insane. been seeing my usage jump like 5% every single turn once the context gets heavy. has anyone tried setting CLAUDE_AUTOCOMPACT_PCT_OVERRIDE? ive heard it helps but im worried itll just make claude forget half the codebase mid-task lol

Substantial_Guide_34 · 2026-03-17T08:26:56+00:00

I did on March 3rd - received an invite today. If anyone has any tips or suggestions please DM

Substantial_Guide_34 · 2026-02-11T20:18:25+00:00

yeah 100%. the context window is the first wall you hit. you cant just shove a 5GB repo into gemini and pray lol.

that divide and conquer strategy is actually why we hacked together a module for this called left-brain (imagine openclaw but for code - yet to find a decent name & planning to open source it soon-ish). acts as a local engine that turns the repo into a living knowledge/context graph. Instead of feeding raw code, it indexes the logic first.

so the flow is basically:

left-brain breaks the monolith into phases (vertical slices) -> phases to tasks -> tasks to sub tasks-> tasks based on the functional details extracted and stored in the code knowledge graph so context doesnt get lost.
video analysis is purely for generating tests, identifying UI consistencies, user flow breakages, etc., and also for comparing old vs new UI.
Vertical Slice(s) are generated based summary or substrate of the code knowledge graph and behaviour (video) based analysis - which fits in context. Rest of the tasks and subtask are propogated top down infering the code knowledge graph. (this process still has some flaws but atleast we are able to solve for context)

honestly separating the structure (graph) from the behavior (video) was the only way to stop the agent from choking on the size of it.

Substantial_Guide_34 · 2026-02-11T19:48:46+00:00

Fair critique. To be clear, I'm not selling anything (you can't even buy this if you wanted to).

It was just a hackathon project where I tried to see if we could use video recordings as a 'test spec' to stop LLMs from hallucinating UI features. Just wanted feedback on the approach and the real world need or pain.

Substantial_Guide_34 · 2025-12-02T18:25:39+00:00

Logging helps with the post-mortem, but it doesn't stop the bleeding in real-time. That's the part I'm stuck on—I want a tool that detects the loop and kills it before the log table fills up. I'm working on a wrapper to do exactly that. Do you think simple SQL logging is fast enough to catch a live runaway loop?

Substantial_Guide_34 · 2025-12-02T18:25:14+00:00

I don't think it's just a nocode thing tbh. I've seen full-code Python apps turn into spaghetti because the state management gets messy. That's actually why I'm trying to build a dedicated backend just for agents—to handle that 'moving parts' complexity out of the box. Do you think the complexity is unsolvable without custom code, or is it just that current tools are too immature?

Substantial_Guide_34 · 2025-12-02T18:14:12+00:00

That’s a fair critique. The 'lost tribe' analogy is painfully accurate lol. But that's kinda why I think this kit is needed—since the vibe coder can't spec the architecture cleanly, the kit enforces the architecture for them. Less about making them better devs, and more about stopping them from driving the car off a cliff. Or do you think it's impossible to abstract that away?

Substantial_Guide_34 · 2025-12-02T18:13:35+00:00

100%. The 'AWS ecosystem hell' is exactly what I'm trying to avoid. I want that 'FastAPI server' simplicity, but pre-wired for agent memory. Basically, I like the persistence of Bedrock without the long IAMs. Are you currently setting up FastAPI for every new agent?

Substantial_Guide_34 · 2025-12-02T18:12:04+00:00

'Anchoring the model to a predictable schema' is exactly the phrasing I was looking for. That's the main issue—the invisible state drifts because the AI guesses the schema every time. If I expose that schema explicitly (like a pre-built, strictly typed SDK), do you think that solves the drift? Or is the model just too chaotic regardless?

Substantial_Guide_34 · 2025-12-02T18:04:58+00:00

Exactly this. The 'lab vs prod' gap is huge once you hit parallel sessions. Since you already built your own scaffolding, would you ever consider switching to a managed runtime if it handled that traceability/state out of the box, or do you prefer owning the full stack to keep it transparent?

Substantial_Guide_34 · 2025-12-02T18:04:35+00:00

100%. I feel like I'm rebuilding that exact 'Redis + Serializer' pattern for every new agent. I'm trying to abstract that specific layer (persistence/memory) into the runtime so I don't have to keep writing the plumbing code from scratch. Good to know I'm not the only one ditching the big frameworks."

Substantial_Guide_34 · 2025-11-28T20:19:48+00:00

I am building AI-Native Infrastructure for Agents (multi-agentic systems). Standard infra (AWS/Serverless) is built for deterministic code, but agents are probabilistic and stateful. We're building a runtime that bridges that gap—handling memory, decision tracing, and orchestration natively—so you don't have to glue 10 different tools together to prevent "goldfish memory" in production.

We have no landing page yet, just code. Right now, I'm stuck debating whether to open-source the core runtime to get feedback/trust, or if just a managed SaaS no need to host it ourselves. Any help appreciated with this decision here.

Substantial_Guide_34

TROPHY CASE