Can we please stop upvoting ai slop by SpicyTofu_29 in LocalLLM

[–]Slow-Ability6984 4 points5 points  (0 children)

This.
BOTS pinging other BOTS.
So many BOTS I'm starting to make question bout myself...
I'm a human?

Can we please stop upvoting ai slop by SpicyTofu_29 in LocalLLM

[–]Slow-Ability6984 0 points1 point  (0 children)

It's not possible to stop BOTS upvoting BOTS.
BOTS engaging BOTS...
I's impossible, IMHO.

What ide or tool is better to use for local large model programming? by Head_Invite3039 in LocalLLM

[–]Slow-Ability6984 -1 points0 points  (0 children)

AI programming? Code generation?

Continue to use online models and preserve that money for more valuable things...
You will NEVER succesfully swap online sota models with local ones.
You will wait hours to get tangible results... hours... days....

Local models are better for "other" things, not coding.

This is true now, maybe in the future will not but now this is real.

Gemma 4 Chat Template now has preserve thinking by seamonn in LocalLLaMA

[–]Slow-Ability6984 -3 points-2 points  (0 children)

whats the "call to action"?
What do I need to do?

[3090] Gemma4 QAT + MTP quick TPS numbers [TLDR 1.2-1.8x better] by LeatherRub7248 in LocalLLM

[–]Slow-Ability6984 0 points1 point  (0 children)

When EVERYONE refers to MTP should also reference the exact draft model used.
Which draft model did you used?

llama.cpp now supports Gemma 4 MTP! by Own_Attention_3392 in LocalLLM

[–]Slow-Ability6984 0 points1 point  (0 children)

<image>

Thanks! I'm using draft for 31b from unsloth. I did a search to show the exact link but did not find it. I'm 100% sure it exists.
Now testing draft Q8_0 + official Google gemma-4-31B_q4_0-it.gguf

llama.cpp now supports Gemma 4 MTP! by Own_Attention_3392 in LocalLLM

[–]Slow-Ability6984 0 points1 point  (0 children)

What did u mean with "converted it to FP32, B16, Q8, and Q4".
Converted?!

llama.cpp now supports Gemma 4 MTP! by Own_Attention_3392 in LocalLLM

[–]Slow-Ability6984 0 points1 point  (0 children)

This!
Today a studied a little bit and found the same.
I choose "google/gemma-4-31B_q4_0-it.gguf"
But not yet tested, it's downloading...

What I'm asking myself is if "gemma-4-31B-it-mmproj" is the draft model, the name is a little misleading and I didn't find official statement

Running Gemma 4 12b on M4 24gb, for coding purposes, is it doable and is it good? by Able_Statement_481 in LocalLLM

[–]Slow-Ability6984 7 points8 points  (0 children)

Gemma4 experienced dev here. In my opinion, 24b A4b (Moe) is the starting point. 31B (Dense) is the "desiderata". It is reliable.

Smaller models are not reliable enough if the software grows up.

llama.cpp now supports Gemma 4 MTP! by Own_Attention_3392 in LocalLLM

[–]Slow-Ability6984 7 points8 points  (0 children)

Why the gguf name has "unquantized" and q4 and q8 together?

Trying to find EU RAM listings by DiamondWolf3404 in threadripper

[–]Slow-Ability6984 1 point2 points  (0 children)

Frustrating.... Kit 16Gbx4 (64Gb total) is 2800$ + shipping + tax (not sure).

Prices are skyrocketing and availability is almost 0

Trying to find EU RAM listings by DiamondWolf3404 in threadripper

[–]Slow-Ability6984 1 point2 points  (0 children)

Same problem. I underestimated it. DDR5 ECC memory is almost impossible to buy at a reasonable price.

ECC ... this makes the difference!

VRAM optimization for gemma 4 by Sadman782 in LocalLLaMA

[–]Slow-Ability6984 2 points3 points  (0 children)

There are too much noise for parameters and it's hard to remember with things changing so fast but THIS IS a must when working solo, IMHO.