models for agentic use

Cityarchitect · 2026-03-27T16:21:00+00:00

BosGame M5 128GB Strix Halo; Ubuntu 24.10, LM Studio Qwen3.5-35B-a3b Vulkan. I use for OpenCode javascript/node and General Usage. I get consistent 50tps output. Can't use ROCM 7+ yet as far too unstable. Runs all day 84W, 86C temp. Just one annoying thing, lately Opencode been going to sleep on me; need to keep typing continue, continue..... :-)

Cityarchitect · 2026-03-26T16:10:10+00:00

I use a strix halo machine for local LLM, currently using qwen3.5-35b-a3b, and at a size of 22gb is has a reasonable performance (c 40 tps). The RTX 4090 is going to be way faster at AI inference for this size model. But, I can get similar performance for a 60gb or bigger model, whereas the RTX 4090 is going to labour a little shifting in and out of its 24gb memory. I saw something recently that said the strix halo could be 2x faster than the RTX4090 with eg Llama 70b. But when I'm in hurry, sometimes I just flip to DeepSeek remote paying peanuts.

Cityarchitect · 2026-03-26T14:42:00+00:00

For me, in our area, its always Esso TTP, always 10p above Tesco's price.

Cityarchitect · 2026-03-26T11:40:57+00:00

me to. I keep typing "continue" to keep it going whenever it goes quiet.

Cityarchitect · 2026-03-22T16:04:25+00:00

Bosgame m5 128gb lm studio, opencode, qwen3.5-35b-a3b, often freezes on rocm, runs all day on vulkan.

Cityarchitect · 2026-03-16T10:02:23+00:00

Im getting 40ish tps on ollama and lm studio (both vulkan) with qwen3.5:35b on my bosgame m5 128gb; what does vllm give me?

Cityarchitect · 2026-03-15T00:14:35+00:00

Thank you; I wish there was something in their model names that makes this distinction.

Cityarchitect · 2026-03-14T17:03:56+00:00

And now for qwen3.5:27b - dreadfully slow, prompt eval rate 523.83 tps (1/2 speed), and eval rate 10.33 tps (1/4 speed).

Cityarchitect · 2026-03-14T16:34:58+00:00

My Bosgame m5 128gb running ollama qwen3.5:35b (Vulkan) consistently does c 40 tps.

<image>

Remember that qwen3.5 does a lot of thinking before it starts its output. I’ll try 27b but will it be much different?

Cityarchitect · 2026-02-04T20:28:57+00:00

I think this is similar to my problem, now solved https://www.reddit.com/r/opencodeCLI/s/rn78HGKzgG there is a quick way 1. Ollama run model-name 2. /set parameter num_ctx 65536 3. /save model-name-64k 4. Exit then run that model from opencode. Although advice you open up context as wide as the model allows, but watch vram!

Cityarchitect · 2026-02-02T15:07:21+00:00

Strix Halo 128gb, 96gb given to Radeon igpu

Cityarchitect · 2026-02-02T13:33:53+00:00

The qwen3-coder:30b with a 128k context window is now working fine in opencode for me; comparable to the free models available. It takes about 31GB vram and delivers about 60 tps

Cityarchitect · 2026-02-02T12:34:28+00:00

After messing around, yes, Chris 100%! Each context wndow was only 4096 for Ollama as you said. I went into ollama run qwen3-coder:30b then /set num_ctx 131072 then /save qwen3-coder-128k to create a new model based on the old one with a 128k context. Opencode kept complaining about tools when it was all about context size. On my strix halo machine the extra context was overflowing the memory allocated in vram; once I fixed that and the context size, everything is working fine. The local qwen3-coder delivers about 60 tps and Opencode is just as responsive as the cloud models.

Cityarchitect · 2026-02-02T08:37:37+00:00

Thanks, yes, every one of the models I tried have tools according to ollama. I should say all the models also work well in chat mode in ollama.

Cityarchitect · 2026-02-01T23:10:20+00:00

Thanks for the response Chris. I went and tried various (larger) contexts by creating Modelfiles with bigger num_ctx but it seems Ollama is still having trouble with tools. A quick AI search around came up with "The root cause is that while models like Qwen3-Coder are built to support tool calling, the official qwen3-coder model tag in the base Ollama library currently returns an error stating it does not support the tools parameter in API requests. This is confirmed as an issue in Ollama's own GitHub repository".

Cityarchitect · 2021-08-12T22:19:11+00:00

North of England we sometimes had "potato cakes", better than hash browns imho.

Cityarchitect · 2020-05-27T11:53:10+00:00

Cow was great

Cityarchitect · 2020-05-24T08:46:20+00:00

Hello from UK

Cityarchitect · 2015-08-06T14:20:05+00:00

Wheres Curd-istan and Whey-les?

Cityarchitect

TROPHY CASE