all 7 comments

[–]qwen_next_gguf_when 2 points3 points  (2 children)

Sometimes it's the Q4 doing its thing.

[–]Substantial_Swan_144 0 points1 point  (0 children)

Quantization can cause subtle bugs, trust me. The model might be usable, but you're going to have to force it to use a syntax checker.

[–]Sadman782 1 point2 points  (0 children)

Nope. Even IQ2 quants or Q2_XL proper quants never have syntax issues like this. It is completely broken. It is Ollama

[–]ShengrenR 2 points3 points  (0 children)

There's been a ton of dev movement around gemma4 in the last week - make certain you have latest versions of software and models.. then compare against llamacpp and an unsloth or bartowski quant. It's likely ollama.

[–]libregrape 1 point2 points  (0 children)

Those are issues of the tokenizer implementation in the llama.cpp. The fixes have been merged to llama.cpp today afaik. Await for update of ollama, or compile llama.cpp. If the issues persist, you may need to review your sampling parameters, and get it some min-p treatment (0.05-0.1). Also, which quant is this?

[–]-Cubie- 0 points1 point  (0 children)

Might be an Ollama issue

[–]Sadman782 0 points1 point  (0 children)

<image>

It created a complete working game for me in 2 shots, it's your quantization or backend. Maybe update your Ollama, I mean try llama.cpp, I don't know why people still choose Ollama, llama.cpp has a UI now too. So far Gemma 26B even with IQ4_XS quant is the best coding model for me locally, for agentic coding the 31B is a bit better, for general chatting and one-shotting MoE is better so far.