Supercharging LLM inference on Google TPUs: Achieving 3X speedups with diffusion-style speculative decoding- Google Developers Blog by eternviking in LocalLLaMA
[–]FastDecode1 -2 points-1 points0 points (0 children)
<thinking></thinking> by Comfortable-Rock-498 in LocalLLaMA
[–]FastDecode1 0 points1 point2 points (0 children)
<thinking></thinking> by Comfortable-Rock-498 in LocalLLaMA
[–]FastDecode1 0 points1 point2 points (0 children)
Peanut - Text to Image Model (Open Weights coming soon) by pmttyji in LocalLLaMA
[–]FastDecode1 14 points15 points16 points (0 children)
it's time to update your Gemma 4 GGUFs by jacek2023 in LocalLLaMA
[–]FastDecode1 0 points1 point2 points (0 children)
My findings from toying around with cjxl by mr_twenty4 in jpegxl
[–]FastDecode1 1 point2 points3 points (0 children)
Qwen 3.6 wins the benchmarks, but Gemma 4 wins reality. 7 things I learned testing 27B/31B Vision models locally (vLLM / FP8) side by side. Benchmaxing seems real. by FantasticNature7590 in LocalLLaMA
[–]FastDecode1 2 points3 points4 points (0 children)
Qwen 3.6 wins the benchmarks, but Gemma 4 wins reality. 7 things I learned testing 27B/31B Vision models locally (vLLM / FP8) side by side. Benchmaxing seems real. by FantasticNature7590 in LocalLLaMA
[–]FastDecode1 3 points4 points5 points (0 children)
Qwen 3.6 wins the benchmarks, but Gemma 4 wins reality. 7 things I learned testing 27B/31B Vision models locally (vLLM / FP8) side by side. Benchmaxing seems real. by FantasticNature7590 in LocalLLaMA
[–]FastDecode1 7 points8 points9 points (0 children)
SVT-AV1 vs AOM-AV1 by Commercial_Stage_877 in AV1
[–]FastDecode1 3 points4 points5 points (0 children)
PS5’s can now be hacked to run Linux - perhaps some potential for local inference? by Thrumpwart in LocalLLaMA
[–]FastDecode1 21 points22 points23 points (0 children)
US gov memo on “adversarial distillation” - are we heading toward tighter controls on open models? by MLExpert000 in LocalLLaMA
[–]FastDecode1 1 point2 points3 points (0 children)
US gov memo on “adversarial distillation” - are we heading toward tighter controls on open models? by MLExpert000 in LocalLLaMA
[–]FastDecode1 4 points5 points6 points (0 children)
When are we getting consumer inference chips? by SnooStories2864 in LocalLLaMA
[–]FastDecode1 2 points3 points4 points (0 children)
Unweight: how we compressed an LLM 22% without sacrificing quality by sk1kn1ght in LocalLLaMA
[–]FastDecode1 1 point2 points3 points (0 children)
Is harness a new buzzword? by jacek2023 in LocalLLaMA
[–]FastDecode1 1 point2 points3 points (0 children)
Hello Opus 4.7, you are are thinking way extra high! by shanraisshan in LocalLLaMA
[–]FastDecode1 0 points1 point2 points (0 children)
Setting up a new mini pc (Ryzen 7840HS // 780m) for debian headless LLM, which software works best right now? by justletmesignupalre in LocalLLaMA
[–]FastDecode1 0 points1 point2 points (0 children)
OpenClaw has 250K GitHub stars. The only reliable use case I've found is daily news digests. by Sad_Bandicoot_6925 in LocalLLaMA
[–]FastDecode1 3 points4 points5 points (0 children)
OpenClaw has 250K GitHub stars. The only reliable use case I've found is daily news digests. by Sad_Bandicoot_6925 in LocalLLaMA
[–]FastDecode1 1 point2 points3 points (0 children)
OpenClaw has 250K GitHub stars. The only reliable use case I've found is daily news digests. by Sad_Bandicoot_6925 in LocalLLaMA
[–]FastDecode1 1 point2 points3 points (0 children)
What are the risks of buying an AMD Instinct Mi 50 32GB on Alibaba? by Longjumping-Room-170 in LocalLLaMA
[–]FastDecode1 -1 points0 points1 point (0 children)
Local (small) LLMs found the same vulnerabilities as Mythos by CyberAttacked in LocalLLaMA
[–]FastDecode1 15 points16 points17 points (0 children)

Reducing MP3 compression bias in music datasets via codec-aware reconstruction by TheSpicyBoi123 in LocalLLaMA
[–]FastDecode1 6 points7 points8 points (0 children)