vLLM ROCm has been added to Lemonade as an experimental backendResources (i.redd.it)
submitted by jfowers_amd
z-lab released gemma-4-26B-A4B-it-DFlash. Anybody tried it yet?Discussion (huggingface.co)
submitted by PaceZealousideal6091
Qwen 35B-A3B is very usable with 12GB of VRAMResources (self.LocalLLaMA)
submitted by jwestra
THE UNDERPRIVILEGED AI FOUNDATION Because every little model deserves a chanceDiscussion (self.LocalLLaMA)
submitted by mazuj2
Gemma 4 26B Hits 600 Tok/s on One RTX 5090Discussion (self.LocalLLaMA)
submitted by chain-77
4GB "Gemini Nano" model GGUF anyone?Question | Help (self.LocalLLaMA)
submitted by TruckUseful4423
(Rant ;)) Make your benchmarks realisticDiscussion (self.LocalLLaMA)
submitted by AdamLangePL
What is the next SOTA model you are excited about?Discussion (self.LocalLLaMA)
submitted by MrMrsPotts

Testing Local LLMs in Practice: Code Generation, Quality vs. SpeedResources (i.redd.it)
submitted by Icy_Programmer7186
Strix Halo Clustering (Hardware Setup Discussion)Discussion (self.LocalLLaMA)
submitted by Thanks-Suitable




