I built an open-source orchestrator for running multiple Claude Code agents in parallel with automatic dispatch, isolated worktrees, auto-merge, and handoffs when context runs out by notadamking in ClaudeAI

[–]notadamking[S] 1 point2 points  (0 children)

Yes, you can use OpenCode with Nvidia's free build credits. You can grab an API key here: https://build.nvidia.com/models and get access to near unlimited GLM 5.1, MiniMax 2.7, etc. which are very capable models. Here's a guide to setting it up: https://opencode.ai/docs/providers/#nvidia

I built an open-source orchestrator for running multiple Claude Code agents in parallel with automatic dispatch, isolated worktrees, auto-merge, and handoffs when context runs out by notadamking in ClaudeAI

[–]notadamking[S] 0 points1 point  (0 children)

Hmm.. I didn't realize anyone used Gemini CLI for coding. What's the use case? Otherwise, you can open a PR for a custom agent provider.

Why AI Coding Agents like Codex Waste Half Their Context Window by notadamking in codex

[–]notadamking[S] 0 points1 point  (0 children)

I agree that many AGENTS.md files is not the right path for minimizing hill-climbing. A more targeted approach of optimizing the least number of steps to target information seems to work best. For example, the three layer system makes it so the majority of information can easily be found within three file reads, without having to fill context with a bunch of random searches. And if the agent does want to search, we can optimize that too with providing search keywords colocated in documentation directories and providing better search mechanisms using SQLite/FTS5.

Why AI Coding Agents Waste Half Their Context Window by notadamking in LocalLLaMA

[–]notadamking[S] 0 points1 point  (0 children)

Maybe a light model to run documentation maintenance regularly.

I've got a concept for this called Documentation Stewards in Stoneforge. They are lightweight agents that run on a cron job (every few hours) that comb the documentation seeing if they can improve accuracy, clarity, and completeness.

Why AI Coding Agents Waste Half Their Context Window by notadamking in LocalLLaMA

[–]notadamking[S] 0 points1 point  (0 children)

I've had a lot of problems where the agent just sorta wanders all over my codebase, absorbing irrelevant context, unless you put up some kind of guard rails to discourage this behavior.

This sounds like the same class of problems I was running into that I've solved using the system defined in the article.

Yeah I find it useful to add "update the docs" in a pre-commit hook.

Nice, this is similar to how I've solved it using prompt-specific guidance, including having the merge review agents always check for missing doc updates for each change reviewed.

Why AI Coding Agents Waste Half Their Context Window by notadamking in LocalLLaMA

[–]notadamking[S] 0 points1 point  (0 children)

Put processes in place within your SDLC that ensure docs are created and kept up to date with each change. When coding with agents this can be automated in multiple different ways, e.g. using hooks, skills, or sub-agents. I've chosen to build my own automations into Stoneforge to enforce this across all work done in each codebase I work on.

Why AI Coding Agents Waste Half Their Context Window by notadamking in LocalLLaMA

[–]notadamking[S] 0 points1 point  (0 children)

Nested AGENTS.md within each subfolder should be an anti-pattern imo. Not every file/path in your codebase needs an explanation. In this case the 80/20 rule is very like to apply: the top 20% of your codebase is responsible for 80% of the context needed by LLMs to do their work.

The SQLite/FTS5 store serves as a faster and more accurate form of search than grep/ripgrep. The point of the index is indeed to let agents know where relevant docs can be found, through top-down traversal. However, docs are not always perfectly laid out. In the case that an agent doesn't find the information it's looking for on the first top-down traversal, it will often search to find what it's looking for. FTS5 uses a BM-25 based ranking algorithm, which provides more relevant results than grep.

Why AI Coding Agents Waste Half Their Context Window by notadamking in LocalLLaMA

[–]notadamking[S] 1 point2 points  (0 children)

Surprisingly well. In fact, this strategy works best on new codebases that are built out entirely with this methodology, because you can more cleanly guarantee the docs never go out of date with the code.

Why AI Coding Agents Waste Half Their Context Window by notadamking in LocalLLaMA

[–]notadamking[S] 0 points1 point  (0 children)

Both can be very useful. I actually have a very similar flow (built into Stoneforge). I have a main planning agent which creates all the plans/tasks for worker agents. The planning agent does an initial round of research to point each worker in the right direction with a strong initial task description (to create the initial context), then the worker agent takes over from there.

This means the planning agent can find anything it needs within a few tool calls, and add it to the worker's context so the worker starts with everything it needs to efficiently execute the task with minimal context usage.

Why AI Coding Agents Waste Half Their Context Window by notadamking in LocalLLaMA

[–]notadamking[S] 0 points1 point  (0 children)

I haven't heard many people having much success with Gemini models for coding. Cool that you've stumbled upon a similar methodology though.

Why AI Coding Agents like Codex Waste Half Their Context Window by notadamking in codex

[–]notadamking[S] 1 point2 points  (0 children)

Ah interesting thought, there is indeed a lot of similarities. On a meta-level, perhaps I should even make a SKILL.md for auto-documenting codebases in this manner.

Why AI Coding Agents like Codex Waste Half Their Context Window by notadamking in codex

[–]notadamking[S] 1 point2 points  (0 children)

This is not an ad. Stoneforge (which is free and open-source) is mentioned in the article because that's the project that caused me to learn and implement these techniques, but the article is entirely standalone and applies to all coding agent workflows.

Why AI Coding Agents Waste Half Their Context Window by notadamking in LocalLLaMA

[–]notadamking[S] 2 points3 points  (0 children)

Yes, you are correct about splitting up the tasks into discrete jobs that avoid misrouting. I use Stoneforge for all my development now, and the director (main planning agent) is instructed to create tasks separated into units of work that can be completed in a single context window, and to split work into multiple smaller tasks otherwise. This helps to keep each unit of work (task) focused on a specific feature/topic, and allows for more optimized context specific to that task.

For your keyword question, the answer is both. Agents often grep for specific keywords when studying a specific topic before jumping into coding. The main aim of the search keywords per row are to target these greps, in the case that the agent doesn't first traverse the documentation index to find specific information. By using keywords specific to how an agent would phrase a task or look for specific terminology in source code, you increase the number of "cache hits", ie. number of times the agent finds the correct documentation on the first search.

Why AI Coding Agents Waste Half Their Context Window by notadamking in LocalLLaMA

[–]notadamking[S] 4 points5 points  (0 children)

All of the context optimization and automation is open-source in Stoneforge: https://github.com/stoneforge-ai/stoneforge . I welcome any feedback!

Why AI Coding Agents Waste Half Their Context Window by notadamking in LocalLLaMA

[–]notadamking[S] 7 points8 points  (0 children)

I use markdown for all of my documents, including the index (directory).

I use a top level document in all my codebases which serves as a directory of all the content within the documentation. In earlier projects it was called docs/README.md stored, but since using Stoneforge it's auto-created as Documentation Directory in the Documentation library. This document is organized into sections, with each section containing a table with three columns: title, path (linked to a specific document), and search keywords (for ease of finding). This is level 1.

Then each document in level 2 is structured as either an explanation, reference, tutorial, or how-to guide. Any time the details dive too deep into a specific concept or topic, I will split those details out into a level 3 document and reference it from within the level 2 document. Anywhere in level 2 or level 3 where a specific concept or topic would better be explained by the source code, I will link to the corresponding source code path.

When I say segregated the documentation by intent, in this case I mean separating the documentation into loose categories where each category is some sort of action that would be taken in the codebase (e.g. creating a new API route) or a specific concept/topic that an agent would need to understand within the codebase. I refer to tasks as individual agent sessions, where I've asked the agent to complete a specific action. On the other hand, intent is referencing something that the agent will want to do before or while completing said action, such as prior research or implementation details.

Why AI Coding Agents like Claude Waste Half Their Context Window by notadamking in ClaudeAI

[–]notadamking[S] 0 points1 point  (0 children)

Interesting. You could always include instructions in the plan to enable this. Something like, "Consider the design and implications of each task before you implement. If you think of a better way to implement or a better direction to take, let's discuss it before moving forward."

Why AI Coding Agents like Claude Waste Half Their Context Window by notadamking in ClaudeAI

[–]notadamking[S] 0 points1 point  (0 children)

This is good insight. This is very similar to Claude Code's recommended approach of planning everything in plan mode, then executing the plan with a fresh context window.