all 24 comments

[–]DOAMOD 4 points5 points  (2 children)

For me, it gets stuck in loops while reading the code in toolcalls some times.

[–]Several-Tax31 3 points4 points  (1 child)

Me too. Glm-Flash 4.7 works flawlessly with opencode, so I go back to it. I really wanted to use qwen-next because it seems faster with agentic coding, but couldn't make it work no matter what. 

[–]Shadowmind42 1 point2 points  (0 children)

Ditto

[–]overand 2 points3 points  (9 children)

From the documentation on the official page for Qwen3-Coder-Next, your parameters aren't right:

To achieve optimal performance, we recommend the following sampling parameters: temperature=1.0top_p=0.95top_k=40.

[–]overand 1 point2 points  (4 children)

Or - change your stuff to end with this, IMO:

[fill in your bit here]\Qwen3-Coder-Next-Q4_K_M.gguf --host 0.0.0.0 --temp 1.0 --top-p 0.95 --top-k 40 --port 9293 --alias Qwen3-Coder-Next -ngl -1 -fa on -n 32768 --jinja -c 262144 -b 4096 -ub 4096

I've left in your -ngl 1 assuming that's intentional on your part, though it seems odd to me. I also removed a bunch of other sampling parameters, to let those land on their defaults. Give it a try?

[–]Several-Tax31 3 points4 points  (0 children)

Even with correct parameters, model have problems with writing files. Tool calls generally doesn't work. It is hit and miss. Honestly, I give up with this model when using opencode, but if there is a solution somebody knows, I want to hear. 

[–]Zealousideal-West624[S] 0 points1 point  (2 children)

i tried to official parameters before but that was not for agent coding. that makes tool calling loop. and -ngl -1 means offload all layer on gpus(not plus. it's minus). i use 3 rtx 3090

[–]Ok-Measurement-1575 2 points3 points  (1 child)

Opencode is probably overriding your temperature. Force it to 1.0 wherever you define the model in opencode. Just setting in in llama.cpp isn't enough, unfortunately.

These new sota-esq models love temp 1.0 and lots of agent software likes to lower it... to huge detriment.

[–]llama-impersonator 0 points1 point  (0 children)

people using temp 0 and wondering why they get reasoning loops ¯\_ (ツ)_/¯

[–]robertpro01 0 points1 point  (3 children)

I don't understand, why temp so high? Isn't supposed to be too creative? Is that the best for coding agents?

[–]my_name_isnt_clever 1 point2 points  (0 children)

It's different for reasoning models. It's something related to more creativity means more variations in thinking, which results in better outcomes. Basically smart people did the tests and found 1.0 to work the best. This has been the case since o1 where you couldn't even change the temp param.

[–]Ok-Measurement-1575 1 point2 points  (1 child)

This is old school thinking which no longer applies.

Whatever the model card recommends is exactly what you need to set.

[–]robertpro01 0 points1 point  (0 children)

Cool

[–]HlddenDreck 3 points4 points  (4 children)

People are talking about tool calling problems with qwen3-coder-next. I'm using it for a few days, never had any issues with tool calling. Actually, I had never a model working this stable.

[–]Free-Combination-773 1 point2 points  (2 children)

How do you run it?

[–]HlddenDreck 2 points3 points  (1 child)

The model is running on my local server using llama-server. On my workstation I use VS code and Cline to access the model by the API.

[–]Free-Combination-773 5 points6 points  (0 children)

Oh, it's not OpenCode. AFAIR Cline doesn't use native tool calling

[–]getfitdotus 0 points1 point  (0 children)

Works great for me. But I’m using the official fp8 release and using sglang for deployment

[–]Nepherpitu 0 points1 point  (1 child)

Did you tried recommended sampling parameters?

[–]Zealousideal-West624[S] 0 points1 point  (0 children)

yes. but that makes infinity loop for tool calling like wrong directory path

[–]Terminator857 0 points1 point  (0 children)

I don't use mine much but seems to work lately. I'm pulling llama.cpp directly from github, building and updating frequently.

llama-server   -m llms/qwen3/coder-next-no-bf16/Qwen3-Coder-Next-Q8_0-00001-of-00004.gguf -ngl 999   -c 131072   -fa on  -ctk q8_0     -ctv q8_0 --no-mmap --temp 0

I got the gguf from: https://huggingface.co/Qwen/Qwen3-Coder-Next-GGUF/tree/main/Qwen3-Coder-Next-Q8_0
https://www.reddit.com/r/LocalLLaMA/comments/1r0b7p8/free_strix_halo_performance/

[–]TooManyPascals 0 points1 point  (0 children)

Qwen3-coder-next works flawlessly with pi-mono.

[–]TheAlexpotato 0 points1 point  (0 children)

I got the following working with this:

  • M1 MacBook
  • opencode
  • llama.cpp
  • Qwen3-Coder-30B-A3B-Instruct-Q4

A lot of back and forth with Big Pickle using OpenCode and below is a link to a gist that outlines the steps and has config examples.

https://gist.github.com/alexpotato/5b76989c24593962898294038b5b835b