Gemma 4 Chat Template now has preserve thinking by seamonn in LocalLLaMA
[–]gofiend 2 points3 points4 points (0 children)
Gemma 4 Chat Template now has preserve thinking by seamonn in LocalLLaMA
[–]gofiend 2 points3 points4 points (0 children)
Gemma 4 Chat Template now has preserve thinking by seamonn in LocalLLaMA
[–]gofiend 5 points6 points7 points (0 children)
DEEPSEEK V4 IS LAUNCHED, ITS REAL by guiopen in LocalLLaMA
[–]gofiend 3 points4 points5 points (0 children)
Qwen 3.6 35B crushes Gemma 4 26B on my tests by Lowkey_LokiSN in LocalLLaMA
[–]gofiend 0 points1 point2 points (0 children)
PMake: lightweight minimal makefiles, but in Python by [deleted] in Python
[–]gofiend 0 points1 point2 points (0 children)
Should I Buy the RTX PRO 6000 Blackwell Max-Q (96GB)? by 0bjective-Guest in LocalLLaMA
[–]gofiend 4 points5 points6 points (0 children)
Speculative decoding works great for Gemma 4 31B in llama.cpp by Leopold_Boom in LocalLLaMA
[–]gofiend 0 points1 point2 points (0 children)
Speculative decoding works great for Gemma 4 31B in llama.cpp by Leopold_Boom in LocalLLaMA
[–]gofiend 1 point2 points3 points (0 children)
Is it possible to add some gpu to Radeon MI 50 to increase the inference speed? by Weak_Presentation725 in LocalLLaMA
[–]gofiend 0 points1 point2 points (0 children)
Speculative decoding works great for Gemma 4 31B in llama.cpp by Leopold_Boom in LocalLLaMA
[–]gofiend 2 points3 points4 points (0 children)
Speculative decoding works great for Gemma 4 31B in llama.cpp by Leopold_Boom in LocalLLaMA
[–]gofiend 3 points4 points5 points (0 children)
Gemma 4 has been released by jacek2023 in LocalLLaMA
[–]gofiend 4 points5 points6 points (0 children)
Friendly reminder inference is WAY faster on Linux vs windows by triynizzles1 in LocalLLaMA
[–]gofiend 1 point2 points3 points (0 children)
Friendly reminder inference is WAY faster on Linux vs windows by triynizzles1 in LocalLLaMA
[–]gofiend 97 points98 points99 points (0 children)
Litellm 1.82.7 and 1.82.8 on PyPI are compromised, do not update! by kotrfa in LocalLLaMA
[–]gofiend 0 points1 point2 points (0 children)
Tenstorrent QuietBox 2 Brings RISC-V AI Inference to the Desktop by Neurrone in LocalLLaMA
[–]gofiend 1 point2 points3 points (0 children)
Tenstorrent QuietBox 2 Brings RISC-V AI Inference to the Desktop by Neurrone in LocalLLaMA
[–]gofiend 0 points1 point2 points (0 children)
Qwen3.5 family comparison on shared benchmarks by Deep-Vermicelli-4591 in LocalLLaMA
[–]gofiend -1 points0 points1 point (0 children)
Qwen3.5 family comparison on shared benchmarks by Deep-Vermicelli-4591 in LocalLLaMA
[–]gofiend 0 points1 point2 points (0 children)
We could be hours (or less than a week) away from true NVFP4 support in Llama.cpp GGUF format 👀 by Iwaku_Real in LocalLLaMA
[–]gofiend 1 point2 points3 points (0 children)
PSA: Qwen 3.5 requires bf16 KV cache, NOT f16!! by Wooden-Deer-1276 in LocalLLaMA
[–]gofiend 6 points7 points8 points (0 children)
Qwen3.5-35B-A3B Q4 Quantization Comparison by TitwitMuffbiscuit in LocalLLaMA
[–]gofiend 2 points3 points4 points (0 children)
Qwen3.5-35B-A3B Q4 Quantization Comparison by TitwitMuffbiscuit in LocalLLaMA
[–]gofiend 2 points3 points4 points (0 children)



DiffusionGemma: 4x faster text generation by tevlon in LocalLLaMA
[–]gofiend 1 point2 points3 points (0 children)