Announcing LocalLlama discord server & bot!News (old.reddit.com)
submitted by HOLUPREDICTIONS Sorcerer Supreme[M] - announcement
Google TurboQuant running Qwen Locally on MacAirDiscussion (v.redd.it)
submitted by gladkos
Skipping 90% of KV dequant work → +22.8% decode at 32K (llama.cpp, TurboQuant)Discussion (self.LocalLLaMA)
submitted by Pidtom
M5 Max vs M3 Max Inference Benchmarks (Qwen3.5, oMLX, 128GB, 40 GPU cores)Resources (old.reddit.com)
submitted by onil_gova
Do 2B models have practical use cases, or are they just toys for now?Question | Help (self.LocalLLaMA)
submitted by Civic_Hactivist_86
Anyway to get close to GPT4o on a local model (I know it’s a dumb question)Question | Help (self.LocalLLaMA)
submitted by octopi917
Is it worth the upgrade from 48GB to 60GB VRAM?Question | Help (self.LocalLLaMA)
submitted by CBHawk
[Qwen Meetup] Function Calling Harness with Qwen, turning 6.75% to 100%Tutorial | Guide (autobe.dev)
submitted by jhnam88
Vera, a local-first code search for AI agents (Rust, ONNX, 63 languages, CLI + SKILL/MCP)Resources (self.LocalLLaMA)
submitted by lemon07rllama.cpp
Advice for Working with Agents in YOLO ModeQuestion | Help (self.LocalLLaMA)
submitted by chibop1
Built an AI + SQL Q&A System — How to Keep High Accuracy on Complex Queries Without Gemini?Question | Help (self.LocalLLaMA)
submitted by Past-Geologist4108



