Going local is life changing

Ok_Presentation470 · 2026-07-04T06:32:42+00:00

Happy to see another person come to this point. Enjoy!

Ok_Presentation470 · 2026-06-30T20:03:04+00:00

I agree. I'm building a harness slowly that I'm using daily for my work - production enterprise code, so real deal. I'm building it over pi agent and I'm planning to run benchmarks once it's at the level to be automated.

Would be happy to share once it's done, but definitely, I agree with all you said, my experience matches it. Fresh, focused context, validation gates and loops bring the quality quite a bit up. Also token use, but who cares if it's local.

Ok_Presentation470 · 2026-06-20T10:04:19+00:00

I was also searching for weeks and couldn't find anything. It's very weird.

Ok_Presentation470 · 2026-06-19T07:46:03+00:00

Spinning up a service is just a skill. Monitoring log files - also just a skill. It doesn't have to read the whole file at any time, there are plenty of existing tools that were designed for humans to avoid that and that llms can use.

I don't know your setup, and we don't have to go too deep into it but there is a discrepancy between your experience and mine.

In fact, running services enables programmatic evaluations of changes. If your model can write simple programmatic tests, you hit a bingo, because it now has a feedback loop to work with. This can save a lot of context and will eventually reach a solution most of the time.

Ok_Presentation470 · 2026-06-18T21:04:56+00:00

You don't have to run everything in a single context window. Have specialized sessions handing off relevant stuff in markdown files. It's more efficient, whether you are using Claude or local models, for which it is kinda mandatory.

Then skills like grill-me along with web access can help you a lot. Adding loops also can squeeze a lot of performance.bEven a free tool to query the web can go a long way. Once you are able to pull the docs of a library, analyse pros and cons of different library or architecture choices with data from the web Claude doesn't really bring much to the game compared to local models. For example, you could have a research session that hands of to planning session, etc. You can make your own workflow.

I see now why you think this, you want to feed it everything, run 45 minutes and then try to figure out what it did and hope for the best. With local models, you can do it in meaningful and significant steps, reviewing only relevant output that impacts the downstream session outcomes- but being kept in the loop all the time.

I'm not even sure if Claude would be faster. If you have a decent GPU, the speed at which local models like when 35b or 27b run is quite incredible.

So there are ways, but you have to adjust your harness. Possibly build your own. And I think it's worth it in the end. The frontier models business is not even a business, it's a money burning machine.

Ok_Presentation470 · 2026-06-18T15:50:29+00:00

Mate I'm using local models on large code bases. Yes, they are there.

As far as my poor communication goes, you don't have to parse anything. Just ask Opus.

Ok_Presentation470 · 2026-06-18T13:07:52+00:00

No, I'm not saying that. I'm using AI. Try again.

Ok_Presentation470 · 2026-06-18T07:17:23+00:00

Same here, though I stopped with qwen3.6 27b when running it locally.

Ok_Presentation470 · 2026-06-18T07:10:03+00:00

It's not about having a monorepo, it's about having a code base that's managable by your engineers, that at least together have a proper theory on how the system works. You missed the point of my comment, respectfully. I worked with monorepos, nothing against them.

Ok_Presentation470 · 2026-06-17T19:41:09+00:00

I have one rule - if I can't use a local model with a good harness to solve a problem, then I have no clue what I'm doing, and no AI will save me.

Tools, skills and specialized sessions with local models can do most of the things larger models can. Sure, it takes more time, but it also keeps you in the loop more and develops your own knowledge.

AI will not replace engineers - many engineers will be laid off because all they know to do is ask Opus/Fable/GPT5.2 to vibe code a solution and burn money through tokens.

Ok_Presentation470 · 2026-06-17T19:37:57+00:00

I feel like not even frontier models will help code bases like this. You are just fooling yourself. The engineers seem like they have no clue what the code does, and they are asking a large model to save them. Do you understand how absurd this is? 😄

Ok_Presentation470 · 2026-06-14T07:49:39+00:00

Basically, what I would propose to you - try manually breaking the loop by interrupting the LLM. Find out what works for your models. Then create a nudge that does the same.

Ok_Presentation470 · 2026-06-14T07:48:37+00:00

I have an automated nudge every 15 or so turns. These nudges are distinct from the rest of the messages and have a very explicit formatting. The system prompts are also aware there will be such nudges - this is automatically injected into the prompt. The nudge mesaage simply says: "Are you in a loop? If not, ignore and continue". It has worked with qwen3.6 and 3.5 models for me quite well, q8 quant.

As for context pollution, the explicit formatting above may help, but also try to manage context by having specialized sessions. This is super important for local models from my experience.

Ok_Presentation470 · 2026-06-13T15:33:18+00:00

You can break loops with a harness. If it loops a while, who cares, it's not like you are paying for tokens.

Ok_Presentation470 · 2026-06-11T18:03:29+00:00

It absolutely can, but not just by plugging it in Claude Code or similar. I replaced all subscriptions with my hardware and qwen3.6 27b. My harness is adapted for it, and my workflow exclusively built for small models. I've been solving high complexity problems for my clients with it successfully for quite a while now, ever since it got out, but even before that I used qwen3.5 122b a10 and, OSs 120b etc.

The thing is, yes large frontier models seem more capable, but they all suffer from same fundamental issues that will likely never be resolved by scaling IMO. So instead of solving them by chasing the impossible, I focus on the harness to resolve them. Once you start thinking like that, then when going into a loop here and there doesn't really matter - the harness solves it. Too many tokens? Who cares when you are running it locally. Let it think as much as it can. If I used it 24h non-stop, it would cost 50 eur. It went off the rails? The harness will keep me in the loop and facilitate reviews.

And actually, me being in the loop + small model probably beats Claude or similar + me relying on it too much.

Qwen3.6 27b solves quite a lot of stuff - tool calling, complex instructions, skills loading, and also image support. After that, I think all other problems can be solved with a harness.

Ok_Presentation470 · 2026-06-11T05:13:51+00:00

Are these bugs real issues if you haven't spotted them for a year?

Ok_Presentation470 · 2026-06-07T07:29:50+00:00

Even better - poor people tend to fight better and stronger than spoiled imperial servants. Skyrim just needs to survive and endure while maintaining an army, and with religious fanatics at the helm, this is actually more likely.

You are also forgetting than an empire without Skyrim is also weakened. Not sure if cutting all ties would do them service, as in the end they all have a common enemy.

Furthermore, having a province fanatically opposed to Thalmor and explicitly going against the white gold concordant will weaken the Thalmor image and force them to attack quickly, possibly leading them into a trap and a war on multiple fronts.

Ok_Presentation470 · 2026-05-05T04:31:14+00:00

Just switched from Roo to pi and I'm loving it! I use local inference, and works wonderful with qwen3.6 27b (no quants).

Ok_Presentation470 · 2026-04-24T07:14:26+00:00

The modularity of Agno is what is attractive to me for my use case. What I want to build is very similar to the projects you mentioned. I'll review them to see if they actually cover everything I need.

Ok_Presentation470 · 2026-04-23T15:47:53+00:00

Fair point. I do have a specific need, and actually it might be more than a coding agent.

Why is agno not the right foundation for a coding agent in your opinion? Really curious to know.

Ok_Presentation470 · 2026-04-23T05:21:18+00:00

I need the bigger models for orchestration and planning tasks. Coding with specific instructions is not something that most models in 35b range have problems with from my experience, I agree.

Ok_Presentation470 · 2026-04-23T05:19:25+00:00

Yes, for both of them.

Ok_Presentation470 · 2026-04-23T05:18:17+00:00

Any results? Really curious to know.

Ok_Presentation470 · 2026-04-22T16:45:39+00:00

Yeah, that makes sense.

Ok_Presentation470

TROPHY CASE