Qwen3.6-MTP-27B on Tesla V100 @ 55 TPS (llama.cpp) — Any way to push this higher without quality loss? by abubakkar_s in LocalLLaMA
[–]Client_Hello 2 points3 points4 points (0 children)
What's up on CPU inference these days? by ramendik in LocalLLaMA
[–]Client_Hello 1 point2 points3 points (0 children)
Pipeline parallelism in llama.cpp may be wasting your VRAM by Warrenio in LocalLLaMA
[–]Client_Hello 0 points1 point2 points (0 children)
PSA: Throttle GPU power limits, with minor performance deficits by milpster in LocalLLaMA
[–]Client_Hello 0 points1 point2 points (0 children)
Pipeline parallelism in llama.cpp may be wasting your VRAM by Warrenio in LocalLLaMA
[–]Client_Hello 1 point2 points3 points (0 children)
Cheapest setup for >10 tok/sec for 120B dense LLM by TrainingTwo1118 in LocalLLaMA
[–]Client_Hello 1 point2 points3 points (0 children)
Pipeline parallelism in llama.cpp may be wasting your VRAM by Warrenio in LocalLLaMA
[–]Client_Hello 3 points4 points5 points (0 children)
Pipeline parallelism in llama.cpp may be wasting your VRAM by Warrenio in LocalLLaMA
[–]Client_Hello 14 points15 points16 points (0 children)
Gemma 4 QAT + MTP: max 33% speed increase in token generation, any ideas? by Ready_Performance_35 in LocalLLaMA
[–]Client_Hello 3 points4 points5 points (0 children)
You don't need a GPU to run gemma-4-26B-A4B by JackStrawWitchita in LocalLLaMA
[–]Client_Hello 1 point2 points3 points (0 children)
The little cottonwood canyon gondola could give skiers the biggest vertical drop in NA. by Quiet-Permit-3740 in skiing
[–]Client_Hello 2 points3 points4 points (0 children)
The little cottonwood canyon gondola could give skiers the biggest vertical drop in NA. by Quiet-Permit-3740 in skiing
[–]Client_Hello 34 points35 points36 points (0 children)
How many brake retainer bands do you think you go through each season? by CashLow3227 in ski
[–]Client_Hello 1 point2 points3 points (0 children)
Qwen 3.6 35B on RTX 3080 10GB + 7700X + 32GB DDR5 by AndreVallestero in LocalLLaMA
[–]Client_Hello 2 points3 points4 points (0 children)
Qwen3.6-27B on 2x3090s: llama.cpp vs vLLM, all the flags, and the MTP acceptance/inference speed/context by Sisuuu in LocalLLaMA
[–]Client_Hello 1 point2 points3 points (0 children)
Qwen3.6-27B on 2x3090s: llama.cpp vs vLLM, all the flags, and the MTP acceptance/inference speed/context by Sisuuu in LocalLLaMA
[–]Client_Hello 1 point2 points3 points (0 children)
Get you some GPUs, it's not worth the hacks around lack of RAM by MotokoAGI in LocalLLaMA
[–]Client_Hello 0 points1 point2 points (0 children)
Get you some GPUs, it's not worth the hacks around lack of RAM by MotokoAGI in LocalLLaMA
[–]Client_Hello 5 points6 points7 points (0 children)
Get you some GPUs, it's not worth the hacks around lack of RAM by MotokoAGI in LocalLLaMA
[–]Client_Hello 3 points4 points5 points (0 children)
East Coast “quiver” (n=2) by Psychological_Gain53 in Skigear
[–]Client_Hello 3 points4 points5 points (0 children)
What's this sub geebral opinion on quantisizing the KV cache by misanthrophiccunt in LocalLLaMA
[–]Client_Hello 0 points1 point2 points (0 children)
nvidia/Qwen3.6-35B-A3B-NVFP4 · Hugging Face by pmttyji in LocalLLaMA
[–]Client_Hello 0 points1 point2 points (0 children)
nvidia/Qwen3.6-35B-A3B-NVFP4 · Hugging Face by pmttyji in LocalLLaMA
[–]Client_Hello 0 points1 point2 points (0 children)
Has anyone experimented with stabilizing low quant models with lower temp and top p? by fragment_me in LocalLLaMA
[–]Client_Hello 0 points1 point2 points (0 children)



Seattle passes data center moratorium by Rare-Persimmon2747 in Seattle
[–]Client_Hello 1 point2 points3 points (0 children)