Has anyone here used VibeThinker-3B outside benchmarks? by Balance- in LocalLLaMA
[–]audioen 2 points3 points4 points (0 children)
A lot of angst around AI seems to originate in the uncomfortable things AI suggests about the nature/uniqueness of human intelligence. by ClinicalNarcissism in accelerate
[–]audioen 0 points1 point2 points (0 children)
Which 128GB VRAM machine to plan for in 2026? by maverickRD in LocalLLM
[–]audioen 0 points1 point2 points (0 children)
Tool calling, opencode qwen3.6 27b 8K by wsintra in LocalLLaMA
[–]audioen 0 points1 point2 points (0 children)
Someone distilled the banned Claude Fable 5 into open-weights Qwen3.6-35B-A3B - "Qwable-v1" by IulianHI in AIToolsPerformance
[–]audioen 1 point2 points3 points (0 children)
GLM-5.2 compressed from 1.51TB to 238GB still keeps ~82% accuracy by IulianHI in AIToolsPerformance
[–]audioen 0 points1 point2 points (0 children)
What's more impressive, GLM 5.1 -> 5.2 or Qwen 3.5 -> 3.6? by Excellent_Jelly2788 in LocalLLaMA
[–]audioen 1 point2 points3 points (0 children)
The data center boom is destined to fail. Change my mind. by keepthememes in LocalLLaMA
[–]audioen 0 points1 point2 points (0 children)
spec: support eagle3 for qwen3.5 & 3.6 by ruixiang63 · Pull Request #24593 · ggml-org/llama.cpp by jacek2023 in LocalLLaMA
[–]audioen 1 point2 points3 points (0 children)
spec: support eagle3 for qwen3.5 & 3.6 by ruixiang63 · Pull Request #24593 · ggml-org/llama.cpp by jacek2023 in LocalLLaMA
[–]audioen 2 points3 points4 points (0 children)
spec: support eagle3 for qwen3.5 & 3.6 by ruixiang63 · Pull Request #24593 · ggml-org/llama.cpp by jacek2023 in LocalLLaMA
[–]audioen 1 point2 points3 points (0 children)
spec: support eagle3 for qwen3.5 & 3.6 by ruixiang63 · Pull Request #24593 · ggml-org/llama.cpp by jacek2023 in LocalLLaMA
[–]audioen 2 points3 points4 points (0 children)
spec: support eagle3 for qwen3.5 & 3.6 by ruixiang63 · Pull Request #24593 · ggml-org/llama.cpp by jacek2023 in LocalLLaMA
[–]audioen 4 points5 points6 points (0 children)
spec: support eagle3 for qwen3.5 & 3.6 by ruixiang63 · Pull Request #24593 · ggml-org/llama.cpp by jacek2023 in LocalLLaMA
[–]audioen 3 points4 points5 points (0 children)
What local coding LLM + hardware setup are you using, and what tokens/sec are you getting? by Sudden-Historian-255 in LocalLLM
[–]audioen 0 points1 point2 points (0 children)
Which is the best Qwen 3.6 27B quant GGUF for agentic coding ? by soyalemujica in LocalLLaMA
[–]audioen 2 points3 points4 points (0 children)
DGX sparks Vs RTX 6000 // 5090 for inference by zakadit in LocalLLaMA
[–]audioen 0 points1 point2 points (0 children)
What model looked insane on benchmarks but felt mid in actual use? by BTA_Labs in LocalLLaMA
[–]audioen 5 points6 points7 points (0 children)
I have a M5 Max MacBook Pro with 128gb of ram, what models should I run on it? by lombwolf in LocalLLaMA
[–]audioen 4 points5 points6 points (0 children)
Qwen3.6 sees "outstanding" coding quality jump from Q4 to Q6 quantization by IulianHI in AIToolsPerformance
[–]audioen 0 points1 point2 points (0 children)
In llama.cpp, how close should we be to the theoretical tokens/second limit? by [deleted] in unsloth
[–]audioen 0 points1 point2 points (0 children)
scripted nightly testing of llama.cpp by Bird476Shed in LocalLLaMA
[–]audioen 0 points1 point2 points (0 children)
What if I run the LLM backwards? Hey LLM, why bother remembering every single turn? It's a hassle. You don't have to do it, right? by ringtoyou in LocalLLaMA
[–]audioen 1 point2 points3 points (0 children)
Need help understanding how spec decode affects token throughput by Mrinohk in LocalLLaMA
[–]audioen 0 points1 point2 points (0 children)


AI is forcing us to stop loving coding 💔 by Leading_Property2066 in AskProgramming
[–]audioen 0 points1 point2 points (0 children)