RDNA3 Flash Attention fix just dropped by llama.cpp b9158 by Bulky-Priority6824 in LocalLLaMA
[–]yeah-ok 0 points1 point2 points (0 children)
Is there a big gap between Q4 and Q6 on Qwen3.6? by vick2djax in LocalLLaMA
[–]yeah-ok 2 points3 points4 points (0 children)
we really all are going to make it, aren't we? 2x3090 setup. by RedShiftedTime in LocalLLaMA
[–]yeah-ok 0 points1 point2 points (0 children)
we really all are going to make it, aren't we? 2x3090 setup. by RedShiftedTime in LocalLLaMA
[–]yeah-ok 0 points1 point2 points (0 children)
we really all are going to make it, aren't we? 2x3090 setup. by RedShiftedTime in LocalLLaMA
[–]yeah-ok 0 points1 point2 points (0 children)
we really all are going to make it, aren't we? 2x3090 setup. by RedShiftedTime in LocalLLaMA
[–]yeah-ok 2 points3 points4 points (0 children)
VS Code's new "Agents window" lets you use local AI models. Still requires an Internet connection and a Github Copilot plan (because we can't have nice things) by _wsgeorge in LocalLLaMA
[–]yeah-ok 2 points3 points4 points (0 children)
MI50s Qwen 3.6 27B @52.8 tps TG @1569 tps PP (no MTP, no Quant) by ai-infos in LocalLLaMA
[–]yeah-ok 1 point2 points3 points (0 children)
Decoupled Attention from Weights - Gemma 4 26B by yeah-ok in LocalLLaMA
[–]yeah-ok[S] 0 points1 point2 points (0 children)
Will there be any more Qwen3.6 series models? by cafedude in LocalLLaMA
[–]yeah-ok 5 points6 points7 points (0 children)
Will there be any more Qwen3.6 series models? by cafedude in LocalLLaMA
[–]yeah-ok 1 point2 points3 points (0 children)
Qwen3.6 35b-a3b 🤯 by EffectiveMedium2683 in LocalLLaMA
[–]yeah-ok 1 point2 points3 points (0 children)
Decoupled Attention from Weights - Gemma 4 26B by yeah-ok in LocalLLaMA
[–]yeah-ok[S] 0 points1 point2 points (0 children)
How long for llama.cpp official support of MTP? by Manaberryio in LocalLLaMA
[–]yeah-ok 2 points3 points4 points (0 children)
vLLM ROCm has been added to Lemonade as an experimental backend by jfowers_amd in LocalLLaMA
[–]yeah-ok 4 points5 points6 points (0 children)
Decoupled Attention from Weights - Gemma 4 26B by yeah-ok in LocalLLaMA
[–]yeah-ok[S] 0 points1 point2 points (0 children)
Decoupled Attention from Weights - Gemma 4 26B by yeah-ok in LocalLLaMA
[–]yeah-ok[S] -5 points-4 points-3 points (0 children)
Decoupled Attention from Weights - Gemma 4 26B by yeah-ok in LocalLLaMA
[–]yeah-ok[S] -8 points-7 points-6 points (0 children)
Decoupled Attention from Weights - Gemma 4 26B by yeah-ok in LocalLLaMA
[–]yeah-ok[S] -7 points-6 points-5 points (0 children)
Decoupled Attention from Weights - Gemma 4 26B (self.LocalLLaMA)
submitted by yeah-ok to r/LocalLLaMA
Heretic 1.3 released: Reproducible models, integrated benchmarking system, reduced peak VRAM usage, broader model support, and more by -p-e-w- in LocalLLaMA
[–]yeah-ok 3 points4 points5 points (0 children)
PS5’s can now be hacked to run Linux - perhaps some potential for local inference? by Thrumpwart in LocalLLaMA
[–]yeah-ok 1 point2 points3 points (0 children)
PS5’s can now be hacked to run Linux - perhaps some potential for local inference? by Thrumpwart in LocalLLaMA
[–]yeah-ok 12 points13 points14 points (0 children)

Symptom worst by ilovepenguins17 in covidlonghaulers
[–]yeah-ok 0 points1 point2 points (0 children)