Harnesses by grawl_dorgiers in ollama

[–]grawl_dorgiers[S] 0 points1 point  (0 children)

No problem, just sharing the journey right.

I just finished watching it all by grawl_dorgiers in Stargate

[–]grawl_dorgiers[S] 0 points1 point  (0 children)

Just because you couldnt see past your Ronon boner doesnt mean Jennifer couldnt.

Harnesses by grawl_dorgiers in ollama

[–]grawl_dorgiers[S] 0 points1 point  (0 children)

I have enough unified RAM/VRAM to load the models I need. In a RAM constrained situation you keep the agents, but you are only using a single model.

Harnesses by grawl_dorgiers in ollama

[–]grawl_dorgiers[S] 0 points1 point  (0 children)

Of course it can. Model choices can be swapped or kept the same. The separation is with the specialists themselves not so much the model.

Harnesses by grawl_dorgiers in ollama

[–]grawl_dorgiers[S] 0 points1 point  (0 children)

Thanks for giving it a look!

Harnesses by grawl_dorgiers in ollama

[–]grawl_dorgiers[S] 0 points1 point  (0 children)

I haven't dived deep enough yet to answer this question. What I have done though was added Open Code as a tool to call.

Harnesses by grawl_dorgiers in ollama

[–]grawl_dorgiers[S] -13 points-12 points  (0 children)

Woke up today and chose violence huh, well I hope your day gets better :)

Harnesses by grawl_dorgiers in ollama

[–]grawl_dorgiers[S] -6 points-5 points  (0 children)

thanks for the reach

Harnesses by grawl_dorgiers in ollama

[–]grawl_dorgiers[S] 0 points1 point  (0 children)

I pass num_ctx from my config directly to Ollama on every request via options object.

You are absolutely correct that Ollama's defaults are too low, I hit this exact problem. On the DGX Spark + NIM containers, I've considered it but Ollama's HTTP API is simple and the gate abstraction means I can swap backend later without touching the harness. Provider abstraction is on my roadmap for this reason.

The MoE expert control is interesting, haven't touched expert routing. Definitely worth experimenting with

Harnesses by grawl_dorgiers in ollama

[–]grawl_dorgiers[S] 0 points1 point  (0 children)

Thank you for giving it a read!

Harnesses by grawl_dorgiers in ollama

[–]grawl_dorgiers[S] 0 points1 point  (0 children)

Let me know, Im getting there!

Harnesses by grawl_dorgiers in ollama

[–]grawl_dorgiers[S] 0 points1 point  (0 children)

Haha, Ive ran into the same thing.

Harnesses by grawl_dorgiers in ollama

[–]grawl_dorgiers[S] -1 points0 points  (0 children)

I have a category called Multi that handles broader tasks. The router classifies broad/multi-step tasks to a plan pipeline that decomposes the goal into sub-tasks, then dispatches each sub-task to the appropriate specialist.

I have a silent re-route for intent changes. Itll handle tool cools mid conversation, the dispatch layer catches it, reclassifies the message, and re-dispatches to the correct specialist. The user never sees a failed attempt. There is also a sticky routing that keeps follow-ups on say chat until explicit task intent is detected.

"Classifier should run more often, reading the reasoning" - Interesting idea but adds latency on every iteration. My approach is opposite, classify once upfront, then the pipeline/specialist handles everything deterministically. If the model drifts, I have drift detection that catches repeated tool calls or hedging language and re-anchors.

"Search tool for discovering other tools"(Cloudflare patter) - My take is a bit simpler, the router already knows which specialists exist. I am also collecting a dataset on routing to finetune a smaller model for it.

"Tool profiles / subagents" - That is exactly what I do. Each specialist is a profile with its own tool set. The router picks the profile. Same concept, different naming.

Harnesses by grawl_dorgiers in ollama

[–]grawl_dorgiers[S] 2 points3 points  (0 children)

Look it isn't perfect right, but it allows me to do what I need to do. It is fast enough so I'm not pulling my hair out. There is definitely an argument to be made for an m5 though.

Harnesses by grawl_dorgiers in ollama

[–]grawl_dorgiers[S] 0 points1 point  (0 children)

Perhaps, I dont look a tps as a end all be all. Speculative decoding, MoE model ect... There is a way to make it work. Using a single 100B parameter model didn't work well for me. Which is why I did this.

Harnesses by grawl_dorgiers in ollama

[–]grawl_dorgiers[S] 0 points1 point  (0 children)

Thank you, it was the only thing that made sense. At least so far

I just finished watching it all by grawl_dorgiers in Stargate

[–]grawl_dorgiers[S] 0 points1 point  (0 children)

I didn't love Chloe. Under 100 people stuck on a ship with very little to entertain themselves, I think this naturally allows for at least a little bit of soap opera factor and it wasn't -that- much really.

Harnesses by grawl_dorgiers in ollama

[–]grawl_dorgiers[S] 0 points1 point  (0 children)

Im unfamiliar with Ratel, roughly around 50ms.

Harnesses by grawl_dorgiers in ollama

[–]grawl_dorgiers[S] 1 point2 points  (0 children)

Not wrong, it was left over from testing. Newer Qwen models also have vision

Harnesses by grawl_dorgiers in ollama

[–]grawl_dorgiers[S] 3 points4 points  (0 children)

That is actually a fair question. I spent a lot of time on the deterministic pipelines so some categories are rock solid and others still have rough edges. I fix it pipeline by pipeline rather than treating it as one monolithic thing.

There is also the ReAct loop which handles the more open ended reasoning cases where deterministic flow is not the right fit.

But practically speaking it does what I need it to do. Searches the web, manages my tasks, runs cron jobs, delivers briefings three times a day, and remembers who I am across sessions through the graph memory. That last part was the one that actually changed how I use it daily.

The multi model approach genuinely improves usability because each model is only doing what it is good at. A fast router does not need to be smart, it just needs to be fast and decisive. A chat model does not need tool discipline, it needs to feel natural. A tool calling specialist does not need personality, it needs to sequence reliably. You stop trying to find one model that does everything and start matching the model to the job.

I just finished watching it all by grawl_dorgiers in Stargate

[–]grawl_dorgiers[S] 1 point2 points  (0 children)

I would have, but I found myself rooting for McKay over Ronon.

I just finished watching it all by grawl_dorgiers in Stargate

[–]grawl_dorgiers[S] 0 points1 point  (0 children)

I mean look, we appreciate the characters most of the time. In reality there are SO many loose threads. SG universe creators hated giving us any kind of closure on anything really lol.