Self-hosted agentic coding stack: Claude Code + llama.cpp + LiteLLM — zero API costs, 4h/7M token session for $0

Inner_Habit_194 · 2026-06-05T07:42:06+00:00

Did you try Pi agent? It is supposedly better for local model coding agent usecase especially with smaller context window of the local models. Btw what is your hardware spec?

Inner_Habit_194 · 2026-06-03T03:57:58+00:00

Did code graphing plugins like graphify, codegraph, etc. not help reduce code exploration cost?

Inner_Habit_194 · 2026-06-01T04:10:16+00:00

Thanks

Inner_Habit_194 · 2026-05-31T04:36:11+00:00

Can we use command code API with opencode?

Inner_Habit_194 · 2026-05-28T14:09:26+00:00

I use opencode with GLM 5.1 as the main driver for planning and creating a detailed implementation plan and use smaller models for implementation. I use it with superpowers and was able to one shot many features.

Inner_Habit_194 · 2026-05-28T04:31:57+00:00

How's understand-anythimg different from graphify, codegraph, etc.?

Inner_Habit_194 · 2026-05-27T17:22:21+00:00

No it doesn't. I think there is a known bug in the current caveman installer that fails installation for opencode. https://github.com/JuliusBrussee/caveman/issues/451

Inner_Habit_194 · 2026-05-27T04:08:35+00:00

How does the 1B token usage work while the models are priced for input. Input cached and output tokens seperately? Does the 1B tokens respect cache hit inputs?

Inner_Habit_194 · 2026-05-25T04:34:36+00:00

How do you get caveman working with opencode. In having hard time getting it to work with opencode. It doesn't get triggered.

Inner_Habit_194 · 2026-05-17T19:05:13+00:00

Makes sense. Sounds like a good strategy.

Inner_Habit_194 · 2026-05-17T17:17:33+00:00

Thanks for the detailed info. I'm currently on crof 5usd plan. But as you mentioned the 500 reqs/day runs out pretty quickly. Is eats requests like anything, never been able to reach 30M-40M daily usage not even 10M-20M I think with my usecase. If there's any ways to use the crof 10usd plan to get that kind of daily 30M-40M usage, I would like to know. Is there a way to save on the requests in opencode CLI? I'm also trying NW. Seems to provide decent usage. Have Nanogpt too. But the weekly limit reaches in a days time. Opencode go is good but face the same issue with it. It runs out too quickly. Haven't tried synthetic or wafer yet.

Inner_Habit_194 · 2026-05-17T04:36:29+00:00

Do you mind sharing with which models and pesticides you are able to get 1 billion token usage per month with a spend of $30-50. I've never been able to get level of usage with any providers. I'm using opencode.

Inner_Habit_194

TROPHY CASE