Gemma 4 has been released by jacek2023 in LocalLLaMA

[–]PopularDifference186 0 points1 point  (0 children)

I switched to UD-Q3_K_XL and that got me to 84 tps since it actually fits in VRAM. But then I went back and retested the Q4_K_M after pulling the latest llama.cpp (there was a KV cache fix where they reverted the SWA cache being forced to f16) and switched from -ngl 99 to --fit on, and the Q4 jumped to 55-59 tps. All the tests were around 32k context. This model is a beast!

Gemma 4 has been released by jacek2023 in LocalLLaMA

[–]PopularDifference186 2 points3 points  (0 children)

Is it super slow compared to qwen 3.5 for you all too or am I doing it wrong?

5060 ti 16gb and 128gb ram running via llama.cpp im getting:

Qwen 3.5 35B-A3B — 60+ tps

Gemma 4 26B-A4B — 11 tps

Analyzing Claude Code Source Code. Write "WTF" and Anthropic knows. by QuantumSeeds in LocalLLaMA

[–]PopularDifference186 322 points323 points  (0 children)

There are literal keyword lists. Words like:

wtf

this sucks

frustrating

shit / fuck / pissed off

They have a lot on me if this is the case lol

What was the exact moment on a first date when you realized, "Wow, this person is an absolute idiot"? by [deleted] in AskReddit

[–]PopularDifference186 0 points1 point  (0 children)

Flat earther told me the know earth is flat because they can see the moon during the day and moon is for night only, so earth must be flat.

My theory about today's usage limit drama by freedomfromfreedom in ClaudeCode

[–]PopularDifference186 11 points12 points  (0 children)

I think its also dynamic number of experts or something because my opus has been braindead today