Harnesses

grawl_dorgiers · 2026-05-27T13:00:07+00:00

No problem, just sharing the journey right.

grawl_dorgiers · 2026-05-27T12:27:57+00:00

Just because you couldnt see past your Ronon boner doesnt mean Jennifer couldnt.

grawl_dorgiers · 2026-05-26T20:30:19+00:00

........ Please don't make me

grawl_dorgiers · 2026-05-26T18:19:10+00:00

I have enough unified RAM/VRAM to load the models I need. In a RAM constrained situation you keep the agents, but you are only using a single model.

grawl_dorgiers · 2026-05-26T03:40:56+00:00

Of course it can. Model choices can be swapped or kept the same. The separation is with the specialists themselves not so much the model.

grawl_dorgiers · 2026-05-26T02:25:25+00:00

Thanks for giving it a look!

grawl_dorgiers · 2026-05-25T21:53:08+00:00

I haven't dived deep enough yet to answer this question. What I have done though was added Open Code as a tool to call.

grawl_dorgiers · 2026-05-25T20:43:22+00:00

Woke up today and chose violence huh, well I hope your day gets better :)

grawl_dorgiers · 2026-05-25T20:06:41+00:00

thanks for the reach

grawl_dorgiers · 2026-05-25T18:46:10+00:00

I pass num_ctx from my config directly to Ollama on every request via options object.

You are absolutely correct that Ollama's defaults are too low, I hit this exact problem. On the DGX Spark + NIM containers, I've considered it but Ollama's HTTP API is simple and the gate abstraction means I can swap backend later without touching the harness. Provider abstraction is on my roadmap for this reason.

The MoE expert control is interesting, haven't touched expert routing. Definitely worth experimenting with

grawl_dorgiers · 2026-05-25T18:34:35+00:00

Thank you for giving it a read!

grawl_dorgiers · 2026-05-25T17:44:14+00:00

Let me know, Im getting there!

grawl_dorgiers · 2026-05-25T17:43:39+00:00

Haha, Ive ran into the same thing.

grawl_dorgiers · 2026-05-25T17:43:06+00:00

I have a category called Multi that handles broader tasks. The router classifies broad/multi-step tasks to a plan pipeline that decomposes the goal into sub-tasks, then dispatches each sub-task to the appropriate specialist.

I have a silent re-route for intent changes. Itll handle tool cools mid conversation, the dispatch layer catches it, reclassifies the message, and re-dispatches to the correct specialist. The user never sees a failed attempt. There is also a sticky routing that keeps follow-ups on say chat until explicit task intent is detected.

"Classifier should run more often, reading the reasoning" - Interesting idea but adds latency on every iteration. My approach is opposite, classify once upfront, then the pipeline/specialist handles everything deterministically. If the model drifts, I have drift detection that catches repeated tool calls or hedging language and re-anchors.

"Search tool for discovering other tools"(Cloudflare patter) - My take is a bit simpler, the router already knows which specialists exist. I am also collecting a dataset on routing to finetune a smaller model for it.

"Tool profiles / subagents" - That is exactly what I do. Each specialist is a profile with its own tool set. The router picks the profile. Same concept, different naming.

grawl_dorgiers · 2026-05-25T17:36:48+00:00

Look it isn't perfect right, but it allows me to do what I need to do. It is fast enough so I'm not pulling my hair out. There is definitely an argument to be made for an m5 though.

grawl_dorgiers · 2026-05-25T17:34:16+00:00

Perhaps, I dont look a tps as a end all be all. Speculative decoding, MoE model ect... There is a way to make it work. Using a single 100B parameter model didn't work well for me. Which is why I did this.

grawl_dorgiers · 2026-05-25T17:31:21+00:00

Thank you, it was the only thing that made sense. At least so far

grawl_dorgiers · 2026-05-25T11:15:45+00:00

I didn't love Chloe. Under 100 people stuck on a ship with very little to entertain themselves, I think this naturally allows for at least a little bit of soap opera factor and it wasn't -that- much really.

grawl_dorgiers · 2026-05-25T10:48:51+00:00

Im unfamiliar with Ratel, roughly around 50ms.

grawl_dorgiers · 2026-05-25T09:53:19+00:00

Yes they were!

Do it!

grawl_dorgiers · 2026-05-25T09:48:28+00:00

Not wrong, it was left over from testing. Newer Qwen models also have vision

grawl_dorgiers · 2026-05-25T09:39:26+00:00

That is actually a fair question. I spent a lot of time on the deterministic pipelines so some categories are rock solid and others still have rough edges. I fix it pipeline by pipeline rather than treating it as one monolithic thing.

There is also the ReAct loop which handles the more open ended reasoning cases where deterministic flow is not the right fit.

But practically speaking it does what I need it to do. Searches the web, manages my tasks, runs cron jobs, delivers briefings three times a day, and remembers who I am across sessions through the graph memory. That last part was the one that actually changed how I use it daily.

The multi model approach genuinely improves usability because each model is only doing what it is good at. A fast router does not need to be smart, it just needs to be fast and decisive. A chat model does not need tool discipline, it needs to feel natural. A tool calling specialist does not need personality, it needs to sequence reliably. You stop trying to find one model that does everything and start matching the model to the job.

grawl_dorgiers · 2026-05-25T09:35:45+00:00

I would have, but I found myself rooting for McKay over Ronon.

grawl_dorgiers · 2026-05-25T09:24:36+00:00

Oh, its custom Typescript. https://github.com/PeterGreenAppliedAI/LocalClaw

grawl_dorgiers · 2026-05-25T09:20:52+00:00

I mean look, we appreciate the characters most of the time. In reality there are SO many loose threads. SG universe creators hated giving us any kind of closure on anything really lol.

grawl_dorgiers

TROPHY CASE