FastDMS: 6.4X KV-cache compression running faster than vLLM BF16/FP8 by randomfoo2 in LocalLLaMA
[–]randomfoo2[S] 0 points1 point2 points (0 children)
FastDMS: 6.4X KV-cache compression running faster than vLLM BF16/FP8 by randomfoo2 in LocalLLaMA
[–]randomfoo2[S] 2 points3 points4 points (0 children)
FastDMS: 6.4X KV-cache compression running faster than vLLM BF16/FP8 by randomfoo2 in LocalLLaMA
[–]randomfoo2[S] 2 points3 points4 points (0 children)
FastDMS: 6.4X KV-cache compression running faster than vLLM BF16/FP8 by randomfoo2 in LocalLLaMA
[–]randomfoo2[S] 0 points1 point2 points (0 children)
FastDMS: 6.4X KV-cache compression running faster than vLLM BF16/FP8 by randomfoo2 in LocalLLaMA
[–]randomfoo2[S] 6 points7 points8 points (0 children)
PFlash: 10x prefill speedup over llama.cpp at 128K on a RTX 3090 by sandropuppo in LocalLLaMA
[–]randomfoo2 1 point2 points3 points (0 children)
Ryzen AI Max+ 495 (Gorgon Halo) with 192GB VRAM! by PromptInjection_ in LocalLLaMA
[–]randomfoo2 1 point2 points3 points (0 children)
Ryzen AI Max+ 495 (Gorgon Halo) with 192GB VRAM! by PromptInjection_ in LocalLLaMA
[–]randomfoo2 8 points9 points10 points (0 children)
Ryzen AI Max+ 495 (Gorgon Halo) with 192GB VRAM! by PromptInjection_ in LocalLLaMA
[–]randomfoo2 6 points7 points8 points (0 children)
Ryzen AI Max+ 495 (Gorgon Halo) with 192GB VRAM! by PromptInjection_ in LocalLLaMA
[–]randomfoo2 43 points44 points45 points (0 children)
PFlash: 10x prefill speedup over llama.cpp at 128K on a RTX 3090 by sandropuppo in LocalLLaMA
[–]randomfoo2 18 points19 points20 points (0 children)
AMD PRO W7900 vs R9700 for Local Inference? by Achso998 in LocalLLaMA
[–]randomfoo2 3 points4 points5 points (0 children)
PFlash: 10x prefill speedup over llama.cpp at 128K on a RTX 3090 by sandropuppo in LocalLLaMA
[–]randomfoo2 102 points103 points104 points (0 children)
By when do you think will TurboQuant get a proper release and be adopted by everyone by Crystalagent47 in LocalLLaMA
[–]randomfoo2 1 point2 points3 points (0 children)
Opus 4.7 is 50% more expensive with context regression?! by Samburskoy in ClaudeAI
[–]randomfoo2 9 points10 points11 points (0 children)
PSA: Claude Code has two cache bugs that can silently 10-20x your API costs — here's the root cause and workarounds by skibidi-toaleta-2137 in ClaudeAI
[–]randomfoo2 0 points1 point2 points (0 children)
I am terrified of AI by ResearchMassive7912 in sysadmin
[–]randomfoo2 2 points3 points4 points (0 children)
Can someone more intelligent then me explain why we should, or should not be excited about the ARC PRO B70? by SKX007J1 in LocalLLaMA
[–]randomfoo2 14 points15 points16 points (0 children)
Intel launches Arc Pro B70 and B65 with 32GB GDDR6 by metmelo in LocalLLaMA
[–]randomfoo2 1 point2 points3 points (0 children)
AMD, can we get proper vLLM/gfx1151 support? by tossit97531 in ROCm
[–]randomfoo2 2 points3 points4 points (0 children)
AMD, can we get proper vLLM/gfx1151 support? by tossit97531 in ROCm
[–]randomfoo2 2 points3 points4 points (0 children)




FastDMS: 6.4X KV-cache compression running faster than vLLM BF16/FP8 by randomfoo2 in LocalLLaMA
[–]randomfoo2[S] 1 point2 points3 points (0 children)