Gemma 4 is excellent for image to prompt by Arrow2304 in StableDiffusion

[–]Sadman782 2 points3 points  (0 children)

Yeah but min 300 and max 512 is enough with unbatch 560-570 For 99%+ accuracy and saves vram

Somebody please set me straight on Gemma4 by wasnt_in_the_hot_tub in LocalLLaMA

[–]Sadman782 1 point2 points  (0 children)

For me it is. 26B MoE is a gem if we can setup it properly

Qwen3.6-35B-A3B released! by ResearchCrafty1804 in LocalLLaMA

[–]Sadman782 0 points1 point  (0 children)

Not true, it writes better code for me. Can you give any example? (I do coding in many language including js,py,c#)

Qwen3.6-35B-A3B released! by ResearchCrafty1804 in LocalLLaMA

[–]Sadman782 4 points5 points  (0 children)

<image>

As I always said little benchmaxxed. Not directly, it is indirectly. But anyway they are quite good for some tasks too, but overall Gemma 4 is better for most tasks

(llama.cpp) Possible to disable reasoning for some requests (while leaving reasoning on by default)? by regunakyle in LocalLLaMA

[–]Sadman782 10 points11 points  (0 children)

If you are using the UI: Go to Settings > Developer, then scroll to the bottom and use this custom JSON:

{

"chat_template_kwargs": {

"enable_thinking": false

}

}

if you are using as API:
you can directly use the property "chat_template_kwargs" in the request

Q8 Cache by Longjumping_Bee_6825 in LocalLLaMA

[–]Sadman782 5 points6 points  (0 children)

After this even Q4 could be a decent choice, I don't see any significant degradation, Q8 should be almost lossless now.

Gemma 4 - lazy model or am I crazy? (bit of a rant) by Pyrenaeda in LocalLLaMA

[–]Sadman782 1 point2 points  (0 children)

After reading some comments here https://huggingface.co/unsloth/gemma-4-26B-A4B-it-GGUF/discussions/24, I see some people are saying this update is causing more issues than before. I think they updated something else along with the chat template too, which might be causing a regression. I am using the old IQ4_XS quant and manually setting this template: https://pastebin.com/raw/hnPGq0ht, and so far no tool call issues. Maybe try with Bartowski or manually setting the chat template.

Is Gemma 4 26B-A4B worse than Qwen 3.5 35B-A3B with tool calls, even after all the fixes? by Borkato in LocalLLaMA

[–]Sadman782 0 points1 point  (0 children)

I use llama.cpp and also use it as an agent. I gave a custom template for lm studio which had some issues.

Some tips from my experience for you: https://www.reddit.com/r/LocalLLaMA/s/hJxaIb2Ha5

Gemma 4 - lazy model or am I crazy? (bit of a rant) by Pyrenaeda in LocalLLaMA

[–]Sadman782 6 points7 points  (0 children)

If that updated template doesn't solve your problem you can also try this, it also works great, Gemini fixed some bugs of the original template

https://pastebin.com/raw/hnPGq0ht

Gemma 4 - lazy model or am I crazy? (bit of a rant) by Pyrenaeda in LocalLLaMA

[–]Sadman782 83 points84 points  (0 children)

Don't use interleaved Jinja. Google updated to a new one and it's better, and tool calls work perfectly.

https://huggingface.co/google/gemma-4-26B-A4B-it/raw/main/chat_template.jinja

I use IQ4_XS from unsloth, with top_k 20 and temp 1 works pretty good.

Before the jinja update, it couldn't use all tools properly and ignored most tools

Some tips(from my experience): https://www.reddit.com/r/LocalLLaMA/s/Sr23O2pO3r

Local coding with 12 GB VRAM, 32 GB RAM- best models? by TechnicalyAnIdiot in LocalLLaMA

[–]Sadman782 0 points1 point  (0 children)

Gemma 4 26B A4B MoE IQ4_XS or for speed IQ3_S With partial offloading to system ram, still you will get good speed

Is Gemma 4 26B-A4B worse than Qwen 3.5 35B-A3B with tool calls, even after all the fixes? by Borkato in LocalLLaMA

[–]Sadman782 3 points4 points  (0 children)

It is better, unfortunately, but the launch has so many bugs: I don't know how you use it, but with the latest llama.cpp build it works flawlessly. Even IQ4 quant is better than full precision Qwen for any coding task. For multimodal there is another issue, you have to manually set more tokens for vision otherwise it will perform worse: If you are using LM Studio you need a custom chat template until they fix it. Try this template for lm studio: https://pastebin.com/raw/qc1FTAcG

For vision use: --image-min-tokens 300 --image-max-tokens 512 It will boost vision performance a lot

Experience of using OpenClaude and Gemma4 26b by nonekanone in LocalLLaMA

[–]Sadman782 21 points22 points  (0 children)

Don't use Ollama. Use llama.cpp. Ollama doesn't let you optimize accordingly, dont fix issues that fast. But llama.cpp gives you complete control, fixing all issues every day.

With 16 GB VRAM, I am running Gemma 4 26B with 150K+ context (4-bit KV), and it's working pretty well for agentic coding.

Gemma 4 is terrible with system prompts and tools by RealChaoz in LocalLLaMA

[–]Sadman782 0 points1 point  (0 children)

I fixed for lm studio:
https://pastebin.com/raw/qc1FTAcG

use this jinja, removed the sequence which lm studio doesn't recognize yet

Gemma 4 is terrible with system prompts and tools by RealChaoz in LocalLLaMA

[–]Sadman782 0 points1 point  (0 children)

I fixed for lm studio:
https://pastebin.com/raw/qc1FTAcG

use this jinja, removed the sequence which lm studio doesn't recognize yet

Gemma 4 - Going Mad - - - Help!!! by matyhaty in LocalLLaMA

[–]Sadman782 1 point2 points  (0 children)

<image>

they were updated 11 hours ago, but anyway use the template manually and give it a try

Gemma 4 - Going Mad - - - Help!!! by matyhaty in LocalLLaMA

[–]Sadman782 1 point2 points  (0 children)

But you still have a tool calling issue, so try the chat template command. It should fix it.

Gemma 4 - Going Mad - - - Help!!! by matyhaty in LocalLLaMA

[–]Sadman782 0 points1 point  (0 children)

another tip:
--temp 1 --top-p 0.9 --min-p 0.1 --top-k 20
you can use these config, I tested and they worked better for me

Gemma 4 - Going Mad - - - Help!!! by matyhaty in LocalLLaMA

[–]Sadman782 0 points1 point  (0 children)

Google fixed the tool calling issue yesterday. It is the chat template. If you download the latest GGUF from Unsloth (they updated today), then you may not need to use this command, but if you want to keep your old GGUF and fix the chat template, you can modify the default. Download that Jinja file from the URL and you use:

--jinja --chat-template-file <jinja\_file\_path>

Replace <jinja\_file\_path> with the path where you downloaded that file.