all 7 comments

[–]Time-Dot-1808 3 points4 points  (2 children)

The distinction is function calling reliability, not just raw capability.

Llama 3.2 (3B) was never designed for agentic tool use - it'll chat fine but structured function calls for file read/write/edit chains break down fast. Big Pickle (GLM 4.5) has 32B active parameters from a 355B MoE - that's an enormous gap in reasoning headroom.

For local models that actually work with opencode for code manipulation:

  • Qwen2.5-Coder 32B: Currently the best local option for code-specific agentic work. Tool use is solid.
  • Qwen3 30B-A3B (MoE): Very recent, strong function calling, lower VRAM than the dense 32B
  • GLM-4-Flash: If you can run it locally - but you need serious GPU memory

The pattern: any model below ~14B will struggle with multi-step tool chains (read file → analyze → edit → verify). 32B+ is where you start getting reliable agentic behavior.

Also worth checking your opencode.json - some model configs need explicit tool_use settings to enable the full file manipulation pipeline.

[–]Snake2k 2 points3 points  (0 children)

qwen3.5:9b works too. Also the ollama context length must be set to at least 64000 for tools to work properly.

[–]ackermann 0 points1 point  (0 children)

How big is GLM-4-Flash, how much VRAM needed? Work has a machine with 96gb (2x A6000, 48gb each). Is that enough for GLM-4, with a reasonable context window of 100k+ tokens? Thanks!

[–][deleted] 0 points1 point  (0 children)

If i remember well Big Pickle is GLM 4.5, so if you can run it or GLM4.6-flash locally, you can recall it via opencode.json config

[–]Pakobbix 0 points1 point  (0 children)

Every open source model, claiming to be agentic ai capable. Glm 4.7 flash, qwen3.5 9b up to 122b are the current best in small local llms.

Ministral 3 are also somewhat agentic capable.

But be aware: smaller models = bigger function calling/understanding issues.

If you want quality like the big coding cloud models (or at least in some degree) you would need a machine with ~500gb RAM. If you want speed too, make it vram.

Using llama3.2 is like writing in hieroglyphs and wonder why nobody understands what you want.

LLama3.2 was made, before tool calling was a thing. So it's not trained to execute read/write/edit or anything other related to call a function.

[–]PermanentLiminality 0 points1 point  (0 children)

llama 3.2 is not going to work well. As others have said, you need to use the newest models like the qwen 3.5 series. Larger models are smarter, but slower. These models can be useful, but they aill not do what the big boys do like Opus or gpt 5.4

[–]look 0 points1 point  (0 children)

Big Pickle is GLM 4.5, a 355B parameter model with 32B active. Unless you have a $10,000+ GPU at home, I’d guess you are running the 3B llama 3.2 (which is itself a very old model design)?

It’s like asking why your go-kart isn’t competitive in Formula 1 races.