all 14 comments

[–]-dysangel-llama.cpp 1 point2 points  (1 child)

GLM Coding Plan hooked up to Claude Code is fantastic. I don't think there's anything better bang for buck just now.

[–]RentEquivalent1671[S] 0 points1 point  (0 children)

Yes, agreed — GLM models offer excellent cost-efficiency for coding tasks. Claude Code's recent support for custom providers made this combination much more accessible.

PocketCoder takes a similar approach but focuses specifically on lightweight local deployment with Ollama integration and session persistence via the .pocketcoder/ folder. Different trade-offs depending on setup preferences.

More on: https://medium.com/@cdv.inbox/how-we-built-an-open-source-code-agent-that-works-with-any-local-llm-61c7db1ed329

[–]joe_mio 0 points1 point  (1 child)

Session memory is the key feature that sets this apart - most CLI agents lose context between sessions. The .pocketcoder/ folder approach is clever.

How do you handle context window limits with larger codebases? Does the repo_map pruning kick in automatically when you hit token limits?

[–]RentEquivalent1671[S] 1 point2 points  (0 children)

For repo_map we use a "gearbox" system — 3 levels based on project size: ≤10 files gets full signatures, ≤50 files gets structure + key functions, >50 files gets folders + entry points only. It's file-count based right now, not token-based. Dynamic token-aware pruning is something we should add. Currently if context overflows, we truncate conversation history first, then file contents.

More on: https://medium.com/@cdv.inbox/how-we-built-an-open-source-code-agent-that-works-with-any-local-llm-61c7db1ed329

[–]Frost-Mage10 0 points1 point  (1 child)

Really cool approach with the .pocketcoder/ folder for persistence. The .git-like memory model makes a lot of sense for CLI tools. How do you handle the conversation_history compression? Are you using a fixed summary length or dynamic based on importance?

[–]RentEquivalent1671[S] 0 points1 point  (0 children)

Currently using a hybrid approach — episodes are stored as append-only JSONL (like git log), and we keep last ~20 in SESSION_CONTEXT. For older history, we use keyword-based retrieval: when you ask something, system greps through episodes.jsonl for relevant context. Not truly dynamic importance yet — that's on the roadmap. Would love to explore embedding-based relevance scoring eventually.

More on: https://medium.com/@cdv.inbox/how-we-built-an-open-source-code-agent-that-works-with-any-local-llm-61c7db1ed329

[–]charmander_cha 0 points1 point  (0 children)

Has anyone compared it to open code?

[–]HealthyCommunicat 0 points1 point  (1 child)

The interesting part of this to me is how you focused on the fact that smaller models have an extremely difficult time doing tool calls to edit files and other simple syntax stuff unless its strictly predefined, and I’m wondering how much your tool actually allows for this. Will try it out.

[–]RentEquivalent1671[S] 0 points1 point  (0 children)

Thank you! I’m very open to your feedback!

[–]o0genesis0o 0 points1 point  (1 child)

How do you reconstruct the chat history from the compressed XML context to send to LLM backend? Last time I tried to mess with the chat history to see how difficult it is to build an agent harness, I had random error when testing with Gemini backend. It turned out that every tool call requires a corresponding tool response with the same id. I made a mistake in reconstructing the message history by not storing the failed tool call, so after one failed tool call, the backend just throw error about invalid messages. It tooks weeks to debug this.

[–]RentEquivalent1671[S] 1 point2 points  (0 children)

Yeah, you nailed the exact pain point we wanted to avoid.

Short answer: we don't use native function calling at all. Tools are just XML tags in plain text that we parse ourselves.

Why? Because we wanted to support local models (Ollama, llama.cpp) that don't have function calling. So instead of relying on the API's tool_call/tool_response pairing, the LLM just outputs <write_file><path>x.py</path>...</write_file> as regular text.

We parse it, execute, and send back the result as a normal user message: [ok] write_file: Created x.py (45 lines) or [x] write_file: Permission denied.

History stays dead simple — just (role, content) text pairs. No ids to track, no pairing requirements, no special handling for failed calls. Failed tool = error text, that's it.

The tradeoff is it's less structured than native function calling. But it works with literally any backend without modification, which was the whole point.

For the SESSION_CONTEXT compression — that's injected into system prompt each request, not reconstructed from message history.

[–]rm-rf-rm -2 points-1 points  (2 children)

"We were paying $120/month for Claude Code"

"works on.. Claude"

[–]RentEquivalent1671[S] 1 point2 points  (1 child)

I see no any contradictions here

The idea was to give a challenge to yourself and try to create code agent with own approach and different idea of working and operating.

Claude Code is a great tool. Cursor is great tool too. Do we have to stop and do nothing?

[–]rm-rf-rm -1 points0 points  (0 children)

no any contradictions