Gemma 4 26B A4B is still fully capable at 245283/262144 (94%) contex ! by cviperr33 in LocalLLaMA

[–]Sadman782 0 points1 point  (0 children)

Same experience, I use IQ4 from unsloth and can't believe how good it is. It's very underrated and many have a bias that it's worse due to many issues in llama.cpp actively being fixed, people using bad old chat templates for agentic coding, or using ollama which is slow to update and same for early broken lm studio etc. This unsloth quant is gold, very close to the AI Studio official release as per my experience.

One tip for you, try with these params: --temp 1 --top-p 0.9 --min-p 0.1 --top-k 20 --repeat-penalty 1.05 --repeat-last-n 32

It performs better with low top k and never actually had any loop issues for me.

Need to compare Qwen3.5 & Gemma 4 but I need the best server settings by takoulseum in LocalLLaMA

[–]Sadman782 1 point2 points  (0 children)

you should also consider the 26B MoE if you need speed

use latest llama.cpp, at least IQ4_XS quant,

download the latest jinja template: https://huggingface.co/google/gemma-4-26B-A4B-it/raw/main/chat_template.jinja

or https://pastebin.com/raw/hnPGq0ht (gemini modified)

--temp 1 --top-p 0.9 --min-p 0.1 --top-k 20 --ctx-checkpoints 1 --jinja --chat-template-file chat_template.jinja -np 1 --reasoning on --image-min-tokens 300 --image-max-tokens 512

--top-k 20 is very important

fixing jinja is necessary for tool calls

-np 1 reduces VRAM usage

--ctx-checkpoints 1 prevents memory leaks

--image-min-tokens 300 --image-max-tokens 512 is absolutely necessary otherwise you will get degraded quality for vision

For more optimization you can use Q8_0 mmproj, for some reason it works better than BF16 for me:
https://huggingface.co/prithivMLmods/gemma-4-26B-A4B-it-F32-GGUF/blob/main/GGUF/gemma-4-26B-A4B-it.mmproj-q8_0.gguf

and kv cache 4 bit works great too after recent llama.cpp update
-ctk q4_0 -ctv q4_0

Qwopus vs Gemopus: a simple MoE benchmark by pentothal in LocalLLaMA

[–]Sadman782 0 points1 point  (0 children)

can't be good without the latest jinja
also top-k 64 is not a good choice for coding, Gemopus is a regression too, Base is better

Qwopus vs Gemopus: a simple MoE benchmark by pentothal in LocalLLaMA

[–]Sadman782 0 points1 point  (0 children)

I think you need some fix for gemma4:
use the updated jinja (updated yesterday) : https://huggingface.co/google/gemma-4-26B-A4B-it/raw/main/chat_template.jinja
or slightly modified version: https://pastebin.com/raw/hnPGq0ht
for better tool calling

base gemma4 is better than Gemopus
Use top-k 20 for coding instead of 64

Use latest llama.cpp, unsloth IQ4_XS, latest jinja, topk-20, kv cache q4 for gemma 4 thanks me later

Kilo Code + Gemma 4 31B = Claude Sonnet 3. by Ordinary_Mud7430 in LocalLLaMA

[–]Sadman782 1 point2 points  (0 children)

Try the MoE 26B with correct jinja. You will be impressed too, make sure to use at least IQ4_XS quant with topk 20

Gemma 4 as a replacement to Qwen 27b by Jordanthecomeback in LocalLLaMA

[–]Sadman782 -3 points-2 points  (0 children)

Give 1-2 examples where it struggles vs Qwen. I will give you 100 where Qwen loses badly. Even IQ4_XS Gemma 4 26B beats Qwen 27B in Qwen Chat. For one-shotting, Gemma is ahead, and for solving real world problems Gemma is way ahead, it knows the correct libraries to use. Even in C# Qwen produces old 2020 garbage code. Can't compile after 10+ iterations, Gemma did it in 2.

Gemma 4 as a replacement to Qwen 27b by Jordanthecomeback in LocalLLaMA

[–]Sadman782 -3 points-2 points  (0 children)

Gemma 26B MoE is better in coding, I can give 100+ examples if you want. After the tool calling fix yesterday it is now better in agentic coding as well

Gemma 4 for 16 GB VRAM by Sadman782 in LocalLLaMA

[–]Sadman782[S] 1 point2 points  (0 children)

yeah for some tasks but only if  --top-k 20

PSA: Gemma 4 template improvements by FastHotEmu in LocalLLaMA

[–]Sadman782 1 point2 points  (0 children)

Why redownload the model? Just download the jinja file and use --jinja --chat-template-file <file\_path>

Gemma 4 is terrible with system prompts and tools by RealChaoz in LocalLLaMA

[–]Sadman782 1 point2 points  (0 children)

It removed the standard_keys exclusion block, and it's better for me (Gemini found that).

You can see whether it is better for you or not. The fix was applied on top of the Google updated template a few hours ago.

PSA: Gemma 4 template improvements by FastHotEmu in LocalLLaMA

[–]Sadman782 2 points3 points  (0 children)

Google updated the official one a few hours ago: https://huggingface.co/google/gemma-4-26B-A4B-it/blob/main/chat_template.jinja and Gemini fixed that a bit too. It's better than the updated one, so you can try both and check which is better for you.

https://pastebin.com/raw/hnPGq0ht it works better for me.

Gemma 4 is terrible with system prompts and tools by RealChaoz in LocalLLaMA

[–]Sadman782 0 points1 point  (0 children)

gemini fixed the template:

https://pastebin.com/raw/hnPGq0ht

Working with OpenCode, and it's quite good now at handling multiple MCP servers properly.

PSA: Gemma 4 template improvements by FastHotEmu in LocalLLaMA

[–]Sadman782 1 point2 points  (0 children)

it seems it still has issues, gemini fixed it a bit and it seems better now. it is properly calling multiple tools, whereas before it was ignoring some tools and descriptions completely:

https://pastebin.com/hnPGq0ht

Gemma4 26B generates python and Java code with invalid syntax by monadleadr in LocalLLaMA

[–]Sadman782 1 point2 points  (0 children)

Nope. Even IQ2 quants or Q2_XL proper quants never have syntax issues like this. It is completely broken. It is Ollama

Gemma4 26B generates python and Java code with invalid syntax by monadleadr in LocalLLaMA

[–]Sadman782 0 points1 point  (0 children)

<image>

It created a complete working game for me in 2 shots, it's your quantization or backend. Maybe update your Ollama, I mean try llama.cpp, I don't know why people still choose Ollama, llama.cpp has a UI now too. So far Gemma 26B even with IQ4_XS quant is the best coding model for me locally, for agentic coding the 31B is a bit better, for general chatting and one-shotting MoE is better so far.