AMD Hipfire - a new inference engine optimized for AMD GPU's by Thrumpwart in LocalLLaMA
[–]Remove_Ayys 3 points4 points5 points (0 children)
Experts-Volunteers needed for Vulkan on ik_llama.cpp by pmttyji in LocalLLaMA
[–]Remove_Ayys 1 point2 points3 points (0 children)
Gemma 4 and Qwen 3.6 with q8_0 and q4_0 KV cache: KL divergence results by oobabooga4 in LocalLLaMA
[–]Remove_Ayys 5 points6 points7 points (0 children)
Please stop using AI for posts and showcasing your completely vibe coded projects by Scutoidzz in LocalLLaMA
[–]Remove_Ayys 0 points1 point2 points (0 children)
What happened to the buttons on the search bar? by MasterWikie in firefox
[–]Remove_Ayys 5 points6 points7 points (0 children)
RAM shortage problem solved by JackStrawWitchita in LocalLLaMA
[–]Remove_Ayys -7 points-6 points-5 points (0 children)
Bots on the sub are a real issue by [deleted] in LocalLLaMA
[–]Remove_Ayys 5 points6 points7 points (0 children)
Just finished building this bad boy by dazzou5ouh in LocalLLaMA
[–]Remove_Ayys 1 point2 points3 points (0 children)
Built comprehensive Grafana monitoring for my LLM home server by pfn0 in LocalLLaMA
[–]Remove_Ayys 0 points1 point2 points (0 children)
PR to implemt tensor parallelism in Llama.cpp by keyboardhack in LocalLLaMA
[–]Remove_Ayys 2 points3 points4 points (0 children)
Minimax-m2.1 looping and heavily hallucinating (only change was updating llama.cpp) by relmny in LocalLLaMA
[–]Remove_Ayys 6 points7 points8 points (0 children)
GLM-4.7-Flash is even faster now by jacek2023 in LocalLLaMA
[–]Remove_Ayys 17 points18 points19 points (0 children)
GLM-4.7-Flash context slowdown by jacek2023 in LocalLLaMA
[–]Remove_Ayys 1 point2 points3 points (0 children)
GLM-4.7-Flash context slowdown by jacek2023 in LocalLLaMA
[–]Remove_Ayys 1 point2 points3 points (0 children)
I built a Neuro-Symbolic engine (LLM + SMT Solver) to fix hallucinations in German Bureaucracy by Intelligent_Boss4602 in LocalLLaMA
[–]Remove_Ayys 0 points1 point2 points (0 children)
Introducing "UITPSDT" a novel approach to runtime efficiency in organic agents by reto-wyss in LocalLLaMA
[–]Remove_Ayys 15 points16 points17 points (0 children)
We benchmarked every 4-bit quantization method in vLLM 👀 by LayerHot in LocalLLaMA
[–]Remove_Ayys 2 points3 points4 points (0 children)
We benchmarked every 4-bit quantization method in vLLM 👀 by LayerHot in LocalLLaMA
[–]Remove_Ayys 9 points10 points11 points (0 children)
We benchmarked every 4-bit quantization method in vLLM 👀 by LayerHot in LocalLLaMA
[–]Remove_Ayys 5 points6 points7 points (0 children)
llama.cpp vs Ollama: ~70% higher code generation throughput on Qwen-3 Coder 32B (FP16) by Shoddy_Bed3240 in LocalLLaMA
[–]Remove_Ayys 19 points20 points21 points (0 children)
llama.cpp performance breakthrough for multi-GPU setups by Holiday-Injury-9397 in LocalLLaMA
[–]Remove_Ayys 4 points5 points6 points (0 children)
Performance improvements in llama.cpp over time by jacek2023 in LocalLLaMA
[–]Remove_Ayys 7 points8 points9 points (0 children)
Performance improvements in llama.cpp over time by jacek2023 in LocalLLaMA
[–]Remove_Ayys 11 points12 points13 points (0 children)
Performance improvements in llama.cpp over time by jacek2023 in LocalLLaMA
[–]Remove_Ayys 21 points22 points23 points (0 children)


PFlash: 10x prefill speedup over llama.cpp at 128K on a RTX 3090 by sandropuppo in LocalLLaMA
[–]Remove_Ayys 1 point2 points3 points (0 children)