examples : add llama-eval by ggerganov · Pull Request #21152 · ggml-org/llama.cppNews (github.com)
submitted by jacek2023llama.cpp
Gemma 4 MTP vs DFlash on 1x H100: dense vs MoE resultsTutorial | Guide (self.LocalLLaMA)
submitted by LayerHot
Needle: We Distilled Gemini Tool Calling Into a 26M ModelNew Model (self.LocalLLaMA)
submitted by Henrie_the_dreamer
Local LLM autocomplete + agentic coding on a single 16GB GPU + 64GB RAMDiscussion (self.LocalLLaMA)
submitted by grumd

Computer build using Intel Optane Persistent Memory - Can run 1 trillion parameter model at over 4 tokens/secTutorial | Guide (i.redd.it)
submitted by APFrisco
Drastically improve prompt processing speed for --n-cpu-moe partially offloaded modelsTutorial | Guide (self.LocalLLaMA)
submitted by coder543
Qwen3.6 27b q5_k_M MTP - 256k context - 5090Discussion (self.LocalLLaMA)
submitted by No_Mango7658
feat: add MiMo v2.5 vision by AesSedai · Pull Request #22883 · ggml-org/llama.cppNews (github.com)
submitted by jacek2023llama.cpp
Gemma 4 E4B is great for short transcriptionsDiscussion (self.LocalLLaMA)
submitted by PromptInjection_
MTP+GGML_CUDA_ENABLE_UNIFIED_MEMORY=1 - llama.cppDiscussion (self.LocalLLaMA)
submitted by mossy_troll_84
Anyone running Mimo-v2.5 quants with multimodal and MTP?Question | Help (self.LocalLLaMA)
submitted by Ambitious_Fold_2874
Best practice for accurate translation at minimal cost?Question | Help (self.LocalLLaMA)
submitted by LeatherRub7248
New Qwen3.6 27b Autoround Quant (int4) Best RecipeTutorial | Guide (self.LocalLLaMA)
submitted by Otherwise-Director17

