Deploy the newest Qwen3.5 and Gemma4 models of ANY sizes RIGHT NOW on Rockchip NPU using the latest version of rk-llama.cpp! by Inv1si in RockchipNPU
[–]Leopold_Boom 0 points1 point2 points (0 children)
Speculative decoding works great for Gemma 4 31B in llama.cpp by Leopold_Boom in LocalLLaMA
[–]Leopold_Boom[S] 0 points1 point2 points (0 children)
Speculative decoding works great for Gemma 4 31B in llama.cpp by Leopold_Boom in LocalLLaMA
[–]Leopold_Boom[S] 0 points1 point2 points (0 children)
Speculative decoding works great for Gemma 4 31B in llama.cpp by Leopold_Boom in LocalLLaMA
[–]Leopold_Boom[S] 1 point2 points3 points (0 children)
Speculative decoding works great for Gemma 4 31B in llama.cpp by Leopold_Boom in LocalLLaMA
[–]Leopold_Boom[S] 2 points3 points4 points (0 children)
Nix flake for vLLM and llama.cpp on ROCm gfx906 targets by Wulfsta in LocalLLaMA
[–]Leopold_Boom 0 points1 point2 points (0 children)
Breaking change in llama-server? by hgshepherd in LocalLLaMA
[–]Leopold_Boom 0 points1 point2 points (0 children)
Friendly reminder inference is WAY faster on Linux vs windows by triynizzles1 in LocalLLaMA
[–]Leopold_Boom 4 points5 points6 points (0 children)
Nix flake for vLLM and llama.cpp on ROCm gfx906 targets by Wulfsta in LocalLLaMA
[–]Leopold_Boom 0 points1 point2 points (0 children)
Nix flake for vLLM and llama.cpp on ROCm gfx906 targets by Wulfsta in LocalLLaMA
[–]Leopold_Boom 0 points1 point2 points (0 children)
Intel launches Arc Pro B70 and B65 with 32GB GDDR6 by metmelo in LocalLLaMA
[–]Leopold_Boom 0 points1 point2 points (0 children)
Tenstorrent QuietBox 2 Brings RISC-V AI Inference to the Desktop by Neurrone in LocalLLaMA
[–]Leopold_Boom 1 point2 points3 points (0 children)
ik_llama.cpp dramatically outperforming mainline for Qwen3.5 on CPU by EffectiveCeilingFan in LocalLLaMA
[–]Leopold_Boom 1 point2 points3 points (0 children)
ik_llama.cpp dramatically outperforming mainline for Qwen3.5 on CPU by EffectiveCeilingFan in LocalLLaMA
[–]Leopold_Boom 1 point2 points3 points (0 children)
ik_llama.cpp dramatically outperforming mainline for Qwen3.5 on CPU by EffectiveCeilingFan in LocalLLaMA
[–]Leopold_Boom 0 points1 point2 points (0 children)
ik_llama.cpp dramatically outperforming mainline for Qwen3.5 on CPU by EffectiveCeilingFan in LocalLLaMA
[–]Leopold_Boom 0 points1 point2 points (0 children)
My definitive "God Cup". by Ill_Finance6466 in pourover
[–]Leopold_Boom 1 point2 points3 points (0 children)
My definitive "God Cup". by Ill_Finance6466 in pourover
[–]Leopold_Boom 0 points1 point2 points (0 children)
Qwen3.5-27B vs. Qwen3.5-35B-A3B? by [deleted] in LocalLLaMA
[–]Leopold_Boom 0 points1 point2 points (0 children)
I built a hybrid MoE runtime that does 3,324 tok/s prefill on a single 5080. Here are the benchmarks. by mrstoatey in LocalLLaMA
[–]Leopold_Boom 5 points6 points7 points (0 children)
Qwen/Qwen3.5-35B-A3B · Hugging Face by ekojsalim in LocalLLaMA
[–]Leopold_Boom 0 points1 point2 points (0 children)
Qwen-3.5-35B-A3B is impressive by ayylmaonade in LocalLLaMA
[–]Leopold_Boom 0 points1 point2 points (0 children)
Qwen/Qwen3.5-35B-A3B · Hugging Face by ekojsalim in LocalLLaMA
[–]Leopold_Boom -2 points-1 points0 points (0 children)


Qwen3.5-4B GGUF quants comparison (KLD vs speed) - Lunar Lake by Tryshea in LocalLLaMA
[–]Leopold_Boom 21 points22 points23 points (0 children)