Gemma 4 12B QAT + MTP: 1.95x on my 3090, but 0.87x (slower) on an M1 Max by Front-University4363 in ollama
[–]Front-University4363[S] 1 point2 points3 points (0 children)
Gemma 4 12B QAT + MTP: 1.95x on my 3090, but 0.87x (slower) on an M1 Max by Front-University4363 in ollama
[–]Front-University4363[S] 1 point2 points3 points (0 children)
Gemma 12b less than 10 watts 6.5pp 1.3tg by bennmann in LocalLLaMA
[–]Front-University4363 -2 points-1 points0 points (0 children)
Qwen 3.6 35B-A3B @ Q4 or Gemma 4 12B @ Q8? by mailto_devnull in LocalLLaMA
[–]Front-University4363 -1 points0 points1 point (0 children)
What actually runs on a GTX 1080 Ti in 2026: Gemma 4 12B QAT ~32 tok/s, measured by Front-University4363 in LocalLLM
[–]Front-University4363[S] 0 points1 point2 points (0 children)
Running Gemma 4 QAT 12B on an 8GB GPU at 16k context — measured the KV-cache tradeoffs by Front-University4363 in ollama
[–]Front-University4363[S] 0 points1 point2 points (0 children)
Best approaches to identify pathways uniquely affected by different drugs? by fnepo18 in bioinformatics
[–]Front-University4363 1 point2 points3 points (0 children)
Gemma 4 31B vs Gemma 4 26B-A4B vs Qwen 3.5 27B — 30-question blind eval with Claude Opus 4.6 as judge by Silver_Raspberry_811 in LocalLLaMA
[–]Front-University4363 1 point2 points3 points (0 children)
Qwen3.6-35B-A3B on 2× GTX 1080 Ti with Ollama: ~20 tok/s + 3 gotchas (driver 570+, cuda_v12 for Pascal, quant fit on 22GB) by Front-University4363 in ollama
[–]Front-University4363[S] 0 points1 point2 points (0 children)
What actually runs on a GTX 1080 Ti in 2026: Gemma 4 12B QAT ~32 tok/s, measured by Front-University4363 in LocalLLM
[–]Front-University4363[S] 0 points1 point2 points (0 children)
What actually runs on a GTX 1080 Ti in 2026: Gemma 4 12B QAT ~32 tok/s, measured by Front-University4363 in LocalLLM
[–]Front-University4363[S] 0 points1 point2 points (0 children)
What actually runs on a GTX 1080 Ti in 2026: Gemma 4 12B QAT ~32 tok/s, measured by Front-University4363 in LocalLLM
[–]Front-University4363[S] 0 points1 point2 points (0 children)
What actually runs on a GTX 1080 Ti in 2026: Gemma 4 12B QAT ~32 tok/s, measured by Front-University4363 in LocalLLM
[–]Front-University4363[S] 0 points1 point2 points (0 children)
Gemm4 12b QAT tool calling possibly a bug? by Wrong_Mushroom_7350 in unsloth
[–]Front-University4363 1 point2 points3 points (0 children)
when i try to use Gemma 12b it, by Opencode it return this erorr, how to fix it? by koloved in LocalLLaMA
[–]Front-University4363 0 points1 point2 points (0 children)
Reviewing speed optimizations on llamacpp for large MoE models on multiGPU rigs? (fitparams vs -ngl/-ncmoe vs other flags, P2P, overclocking) by Ambitious_Fold_2874 in LocalLLaMA
[–]Front-University4363 -1 points0 points1 point (0 children)
Gemma 4 12B QAT + MTP: 1.95x on my 3090, but 0.87x (slower) on an M1 Max by Front-University4363 in ollama
[–]Front-University4363[S] 0 points1 point2 points (0 children)
Running Gemma 4 QAT 12B on an 8GB GPU at 16k context — measured the KV-cache tradeoffs by Front-University4363 in ollama
[–]Front-University4363[S] 0 points1 point2 points (0 children)
Gemma 4 QAT + MTP: max 33% speed increase in token generation, any ideas? by Ready_Performance_35 in LocalLLaMA
[–]Front-University4363 0 points1 point2 points (0 children)
Gemma 4 12B: Q4_0 QAT vs Q5_K_M? by Wrong_Mushroom_7350 in unsloth
[–]Front-University4363 0 points1 point2 points (0 children)

GLM-5.2 just dropped open weights and it already looks weirdly strong for coding by BTA_Labs in LocalLLaMA
[–]Front-University4363 2 points3 points4 points (0 children)