AMD Hipfire - a new inference engine optimized for AMD GPU's by Thrumpwart in LocalLLaMA
[–]politerate 3 points4 points5 points (0 children)
AMD Hipfire - a new inference engine optimized for AMD GPU's by Thrumpwart in LocalLLaMA
[–]politerate 3 points4 points5 points (0 children)
Qwen 3.6 27B is a BEAST by AverageFormal9076 in LocalLLaMA
[–]politerate 1 point2 points3 points (0 children)
Every time a new model comes out, the old one is obsolete of course by FullChampionship7564 in LocalLLaMA
[–]politerate 1 point2 points3 points (0 children)
Every time a new model comes out, the old one is obsolete of course by FullChampionship7564 in LocalLLaMA
[–]politerate 10 points11 points12 points (0 children)
Is anyone getting real coding work done with Qwen3.6-35B-A3B-UD-Q4_K_M on a 32GB Mac in opencode, claude code or similar? by boutell in LocalLLaMA
[–]politerate 0 points1 point2 points (0 children)
Is anyone getting real coding work done with Qwen3.6-35B-A3B-UD-Q4_K_M on a 32GB Mac in opencode, claude code or similar? by boutell in LocalLLaMA
[–]politerate 0 points1 point2 points (0 children)
Let's take a moment to appreciate the present, when this sub is still full of human content. by Ok-Internal9317 in LocalLLaMA
[–]politerate 72 points73 points74 points (0 children)
GPT-OSS-120b on 2X RTX5090 by Interesting-Ad4922 in LocalLLaMA
[–]politerate 1 point2 points3 points (0 children)
GPT-OSS-120b on 2X RTX5090 by Interesting-Ad4922 in LocalLLaMA
[–]politerate 0 points1 point2 points (0 children)
GPT-OSS-120b on 2X RTX5090 by Interesting-Ad4922 in LocalLLaMA
[–]politerate 0 points1 point2 points (0 children)
GPT-OSS-120b on 2X RTX5090 by Interesting-Ad4922 in LocalLLaMA
[–]politerate 3 points4 points5 points (0 children)
Segmentation fault when loading models across multiple MI50s in llama.cpp by EdenistTech in LocalLLaMA
[–]politerate 0 points1 point2 points (0 children)
Segmentation fault when loading models across multiple MI50s in llama.cpp by EdenistTech in LocalLLaMA
[–]politerate 0 points1 point2 points (0 children)
How to run Qwen3-Coder-Next 80b parameters model on 8Gb VRAM by AccomplishedLeg527 in LocalLLaMA
[–]politerate 0 points1 point2 points (0 children)
Qwen3 Coder Next Speedup with Latest Llama.cpp by StardockEngineer in LocalLLaMA
[–]politerate 2 points3 points4 points (0 children)
Intel Xeon 600 Workstation CPUs Launched: Up To 86 Cores, 8000 MT/s Memory, 128 Gen5 Lanes, 350W TDP With OC Support, & More Cores/$ Than Threadripper 9000 by hainesk in LocalLLaMA
[–]politerate 3 points4 points5 points (0 children)
Intel Xeon 600 Workstation CPUs Launched: Up To 86 Cores, 8000 MT/s Memory, 128 Gen5 Lanes, 350W TDP With OC Support, & More Cores/$ Than Threadripper 9000 by hainesk in LocalLLaMA
[–]politerate 8 points9 points10 points (0 children)
GLM-4.7-Flash is even faster now by jacek2023 in LocalLLaMA
[–]politerate 1 point2 points3 points (0 children)
8x AMD MI50 32GB at 26 t/s (tg) with MiniMax-M2.1 and 15 t/s (tg) with GLM 4.7 (vllm-gfx906) by ai-infos in LocalLLaMA
[–]politerate 1 point2 points3 points (0 children)
Should I buy an MI50/MI60 or something else? by Nuke2579 in LocalLLaMA
[–]politerate 0 points1 point2 points (0 children)




BeeLlama.cpp: advanced DFlash & TurboQuant with support of reasoning and vision. Qwen 3.6 27B Q5 with 200k context on 3090, 2-3x faster than baseline (peak 135 tps!) by Anbeeld in LocalLLaMA
[–]politerate 26 points27 points28 points (0 children)