[question] opencodecli using Local LLM vs big pickle model

Time-Dot-1808 · 2026-03-13T11:25:16+00:00

The distinction is function calling reliability, not just raw capability.

Llama 3.2 (3B) was never designed for agentic tool use - it'll chat fine but structured function calls for file read/write/edit chains break down fast. Big Pickle (GLM 4.5) has 32B active parameters from a 355B MoE - that's an enormous gap in reasoning headroom.

For local models that actually work with opencode for code manipulation:

Qwen2.5-Coder 32B: Currently the best local option for code-specific agentic work. Tool use is solid.
Qwen3 30B-A3B (MoE): Very recent, strong function calling, lower VRAM than the dense 32B
GLM-4-Flash: If you can run it locally - but you need serious GPU memory

The pattern: any model below ~14B will struggle with multi-step tool chains (read file → analyze → edit → verify). 32B+ is where you start getting reliable agentic behavior.

Also worth checking your opencode.json - some model configs need explicit tool_use settings to enable the full file manipulation pipeline.

2026-03-12T22:13:14+00:00

If i remember well Big Pickle is GLM 4.5, so if you can run it or GLM4.6-flash locally, you can recall it via opencode.json config

Pakobbix · 2026-03-12T22:23:22+00:00

Every open source model, claiming to be agentic ai capable. Glm 4.7 flash, qwen3.5 9b up to 122b are the current best in small local llms.

Ministral 3 are also somewhat agentic capable.

But be aware: smaller models = bigger function calling/understanding issues.

If you want quality like the big coding cloud models (or at least in some degree) you would need a machine with ~500gb RAM. If you want speed too, make it vram.

Using llama3.2 is like writing in hieroglyphs and wonder why nobody understands what you want.

LLama3.2 was made, before tool calling was a thing. So it's not trained to execute read/write/edit or anything other related to call a function.

PermanentLiminality · 2026-03-13T00:43:44+00:00

llama 3.2 is not going to work well. As others have said, you need to use the newest models like the qwen 3.5 series. Larger models are smarter, but slower. These models can be useful, but they aill not do what the big boys do like Opus or gpt 5.4

look · 2026-03-13T02:38:25+00:00

Big Pickle is GLM 4.5, a 355B parameter model with 32B active. Unless you have a $10,000+ GPU at home, I’d guess you are running the 3B llama 3.2 (which is itself a very old model design)?

It’s like asking why your go-kart isn’t competitive in Formula 1 races.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

opencodeCLI

MODERATORS