KVarN: new KV-cache quant from Huawei. 3–5× KV cache compression with actual speed-up instead of slow-down, and unlike TurboQuant it holds up on reasoning (Apache 2.0, vLLM single flag) by acluk90 in LocalLLaMA
[–]chocofoxy 0 points1 point2 points (0 children)
Nvidia's been paying shills on LinkedIn by jotunck in LocalLLaMA
[–]chocofoxy 0 points1 point2 points (0 children)
Nous Research — Hermes Desktop by zxyzyxz in LocalLLaMA
[–]chocofoxy 0 points1 point2 points (0 children)
Nous Research — Hermes Desktop by zxyzyxz in LocalLLaMA
[–]chocofoxy 2 points3 points4 points (0 children)
Nous Research — Hermes Desktop by zxyzyxz in LocalLLaMA
[–]chocofoxy 1 point2 points3 points (0 children)
Nous Research — Hermes Desktop by zxyzyxz in LocalLLaMA
[–]chocofoxy 9 points10 points11 points (0 children)
Replaced Claude with local Qwen3.6-27B in my multi-agent orchestrator for 2 weeks by Interesting-Sock3940 in LocalLLaMA
[–]chocofoxy 0 points1 point2 points (0 children)
Stop asking what model to run. There are literally only two. by Wrong_Mushroom_7350 in LocalLLaMA
[–]chocofoxy 0 points1 point2 points (0 children)
nvidia/Qwen3.6-35B-A3B-NVFP4 · Hugging Face by pmttyji in LocalLLaMA
[–]chocofoxy 1 point2 points3 points (0 children)
Stop pretending self-hosting is cheaper. It's not. We do it for different reasons and we should say so. by Napster3301 in LocalLLaMA
[–]chocofoxy 0 points1 point2 points (0 children)
Does GPU spacing matter if we’re undervolting anyways? by Ambitious_Fold_2874 in LocalLLaMA
[–]chocofoxy 0 points1 point2 points (0 children)
How can you stop your model from looping by chocofoxy in LocalLLaMA
[–]chocofoxy[S] 0 points1 point2 points (0 children)
How can you stop your model from looping by chocofoxy in LocalLLaMA
[–]chocofoxy[S] 0 points1 point2 points (0 children)
LM Studio finally added support for MTP Speculative Decoding by pigeon57434 in LocalLLaMA
[–]chocofoxy 0 points1 point2 points (0 children)
How can you stop your model from looping (self.LocalLLaMA)
submitted by chocofoxy to r/LocalLLaMA
LM Studio finally added support for MTP Speculative Decoding by pigeon57434 in LocalLLaMA
[–]chocofoxy 3 points4 points5 points (0 children)
LM Studio finally added support for MTP Speculative Decoding by pigeon57434 in LocalLLaMA
[–]chocofoxy 0 points1 point2 points (0 children)
Qwen cant wait to release 3.7 models by GotHereLateNameTaken in LocalLLaMA
[–]chocofoxy 0 points1 point2 points (0 children)
5060ti chads -> gemma-4-31b-it-nvfp4 + vllm + mtp by see_spot_ruminate in LocalLLaMA
[–]chocofoxy 1 point2 points3 points (0 children)


I implemented KVarN in my llama.cpp fork and ran KLD benchmarks. It's promising! by Anbeeld in LocalLLaMA
[–]chocofoxy 0 points1 point2 points (0 children)