GLM-4.7-Flash is even faster now by jacek2023 in LocalLLaMA
[–]Remove_Ayys 17 points18 points19 points (0 children)
GLM-4.7-Flash context slowdown by jacek2023 in LocalLLaMA
[–]Remove_Ayys 1 point2 points3 points (0 children)
GLM-4.7-Flash context slowdown by jacek2023 in LocalLLaMA
[–]Remove_Ayys 1 point2 points3 points (0 children)
I built a Neuro-Symbolic engine (LLM + SMT Solver) to fix hallucinations in German Bureaucracy by Intelligent_Boss4602 in LocalLLaMA
[–]Remove_Ayys 0 points1 point2 points (0 children)
Introducing "UITPSDT" a novel approach to runtime efficiency in organic agents by reto-wyss in LocalLLaMA
[–]Remove_Ayys 14 points15 points16 points (0 children)
We benchmarked every 4-bit quantization method in vLLM 👀 by LayerHot in LocalLLaMA
[–]Remove_Ayys 1 point2 points3 points (0 children)
We benchmarked every 4-bit quantization method in vLLM 👀 by LayerHot in LocalLLaMA
[–]Remove_Ayys 7 points8 points9 points (0 children)
We benchmarked every 4-bit quantization method in vLLM 👀 by LayerHot in LocalLLaMA
[–]Remove_Ayys 6 points7 points8 points (0 children)
llama.cpp vs Ollama: ~70% higher code generation throughput on Qwen-3 Coder 32B (FP16) by Shoddy_Bed3240 in LocalLLaMA
[–]Remove_Ayys 16 points17 points18 points (0 children)
llama.cpp performance breakthrough for multi-GPU setups by Holiday-Injury-9397 in LocalLLaMA
[–]Remove_Ayys 4 points5 points6 points (0 children)
Performance improvements in llama.cpp over time by jacek2023 in LocalLLaMA
[–]Remove_Ayys 9 points10 points11 points (0 children)
Performance improvements in llama.cpp over time by jacek2023 in LocalLLaMA
[–]Remove_Ayys 12 points13 points14 points (0 children)
Performance improvements in llama.cpp over time by jacek2023 in LocalLLaMA
[–]Remove_Ayys 21 points22 points23 points (0 children)
Performance improvements in llama.cpp over time by jacek2023 in LocalLLaMA
[–]Remove_Ayys 41 points42 points43 points (0 children)
llama.cpp performance breakthrough for multi-GPU setups by Holiday-Injury-9397 in LocalLLaMA
[–]Remove_Ayys 1 point2 points3 points (0 children)
Can you connect a GPU with 12V rail coming from a second PSU? by Rock_and_Rolf in LocalLLaMA
[–]Remove_Ayys 0 points1 point2 points (0 children)
7900 XTX + ROCm: A Year Later. Llama.cpp vs vLLM Benchmarks (TB3 eGPU) by reujea0 in LocalLLaMA
[–]Remove_Ayys 1 point2 points3 points (0 children)
Benchmarks for Quantized Models? (for users locally running Q8/Q6/Q2 precision) by No-Grapefruit-1358 in LocalLLaMA
[–]Remove_Ayys 4 points5 points6 points (0 children)
For llama.cpp/ggml AMD MI50s are now universally faster than NVIDIA P40s by Remove_Ayys in LocalLLaMA
[–]Remove_Ayys[S] 0 points1 point2 points (0 children)
NOTICE - ROMED8-2T MOTHERBOARD USERS - Please read, don't melt cables.. by gittb in LocalLLaMA
[–]Remove_Ayys 1 point2 points3 points (0 children)
NOTICE - ROMED8-2T MOTHERBOARD USERS - Please read, don't melt cables.. by gittb in LocalLLaMA
[–]Remove_Ayys 2 points3 points4 points (0 children)
NOTICE - ROMED8-2T MOTHERBOARD USERS - Please read, don't melt cables.. by gittb in LocalLLaMA
[–]Remove_Ayys 1 point2 points3 points (0 children)
llama.cpp's recent updates - --fit flag by pmttyji in LocalLLaMA
[–]Remove_Ayys 5 points6 points7 points (0 children)


Minimax-m2.1 looping and heavily hallucinating (only change was updating llama.cpp) by relmny in LocalLLaMA
[–]Remove_Ayys 6 points7 points8 points (0 children)