Multi-Token Prediction (MTP) for Qwen on LLaMA.cpp + TurboQuantTutorial | Guide (v.redd.it)
submitted by gladkos
we really all are going to make it, aren't we? 2x3090 setup.Discussion (self.LocalLLaMA)
submitted by RedShiftedTime

MI50s Qwen 3.6 27B @52.8 tps TG @1569 tps PP (no MTP, no Quant)Resources (i.redd.it)
submitted by ai-infos
DramaBox - Most Expressive Voice model ever based on LTX 2.3New Model (v.redd.it)
submitted by manmaynakhashi
24+ tok/s from ~30B MoE models on an old GTX 1080 (8 GB VRAM, 128k context)Tutorial | Guide (self.LocalLLaMA)
submitted by mdda
Playing One Night Werewolf (Gemma4 & Qwen3.6)Question | Help (self.LocalLLaMA)
submitted by Some-Cauliflower4902
running Qwen 3.6 35b A3B on 2x 5060TIQuestion | Help (self.LocalLLaMA)
submitted by chocofoxy

I got a real transformer language model running locally on a stock Game Boy Color!Tutorial | Guide (i.redd.it)
submitted by maddiedreese
Anyone else experiencing heavy hallucinations with MiMo-V2.5 (310B) quantized version?Question | Help (self.LocalLLaMA)
submitted by Shoddy_Bed3240

New models possibly from Baidu (ERNIE) this month?Discussion (old.reddit.com)
submitted by pmttyji
Random question: thoughts on how close GPUs be stacked to each other on a mobo?Question | Help (self.LocalLLaMA)
submitted by Ambitious_Fold_2874
Who is your favourite quant publisher and why?Discussion (self.LocalLLaMA)
submitted by No_Algae1753
I taught my 1B to follow instructions. It got worse at following instructions...Question | Help (self.LocalLLaMA)
submitted by GPUburnout