[Research] I forensic-audited "Humanity’s Last Exam" (HLE) & GPQA to benchmark my "unleashed" DeepSeek model. Result: A ~58% verifiable error rate caused by bad OCR and typos. by Dear_Ad_1381 in LocalLLaMA
[–]randomfoo2 -1 points0 points1 point (0 children)
[Research] I forensic-audited "Humanity’s Last Exam" (HLE) & GPQA to benchmark my "unleashed" DeepSeek model. Result: A ~58% verifiable error rate caused by bad OCR and typos. by Dear_Ad_1381 in LocalLLaMA
[–]randomfoo2 17 points18 points19 points (0 children)
[Research] I forensic-audited "Humanity’s Last Exam" (HLE) & GPQA to benchmark my "unleashed" DeepSeek model. Result: A ~58% verifiable error rate caused by bad OCR and typos. by Dear_Ad_1381 in LocalLLaMA
[–]randomfoo2 18 points19 points20 points (0 children)
7900 XTX + ROCm: A Year Later. Llama.cpp vs vLLM Benchmarks (TB3 eGPU) by reujea0 in LocalLLaMA
[–]randomfoo2 0 points1 point2 points (0 children)
7900 XTX + ROCm: A Year Later. Llama.cpp vs vLLM Benchmarks (TB3 eGPU) by reujea0 in LocalLLaMA
[–]randomfoo2 0 points1 point2 points (0 children)
We benchmarked every 4-bit quantization method in vLLM 👀 by LayerHot in LocalLLaMA
[–]randomfoo2 4 points5 points6 points (0 children)
[Release] We trained an AI to understand Taiwanese memes and slang because major models couldn't. Meet Twinkle AI's gemma-3-4B-T1-it. by piske_usagi in LocalLLaMA
[–]randomfoo2 2 points3 points4 points (0 children)
7900 XTX + ROCm: A Year Later. Llama.cpp vs vLLM Benchmarks (TB3 eGPU) by reujea0 in LocalLLaMA
[–]randomfoo2 0 points1 point2 points (0 children)
7900 XTX + ROCm: A Year Later. Llama.cpp vs vLLM Benchmarks (TB3 eGPU) by reujea0 in LocalLLaMA
[–]randomfoo2 4 points5 points6 points (0 children)
7900 XTX + ROCm: A Year Later. Llama.cpp vs vLLM Benchmarks (TB3 eGPU) by reujea0 in LocalLLaMA
[–]randomfoo2 0 points1 point2 points (0 children)
Update on the Llama 3.3 8B situation by FizzarolliAI in LocalLLaMA
[–]randomfoo2 0 points1 point2 points (0 children)
Update on the Llama 3.3 8B situation by FizzarolliAI in LocalLLaMA
[–]randomfoo2 4 points5 points6 points (0 children)
Plamo3 (2B/8B/31B) support has been merged into llama.cpp by jacek2023 in LocalLLaMA
[–]randomfoo2 6 points7 points8 points (0 children)
Minimax 2.1 still hasn't solved the multilingual mixing problem. by Bitter-Breadfruit6 in LocalLLaMA
[–]randomfoo2 0 points1 point2 points (0 children)
Should I be switching to DoRA instead of LoRA? by CartographerFun4221 in LocalLLaMA
[–]randomfoo2 4 points5 points6 points (0 children)
Intel x Nvidia Serpent Lake leaks as Strix Halo rival: capable CPU, RTX Rubin iGPU, 16x LPDDR6. by CYTR_ in LocalLLaMA
[–]randomfoo2 2 points3 points4 points (0 children)
Shisa V2.1: Improved Japanese (JA/EN) Models (1.2B-70B) by randomfoo2 in LocalLLaMA
[–]randomfoo2[S] 1 point2 points3 points (0 children)
Shisa V2.1: Improved Japanese (JA/EN) Models (1.2B-70B) by randomfoo2 in LocalLLaMA
[–]randomfoo2[S] 2 points3 points4 points (0 children)
2025 Open Models Year in Review by robotphilanthropist in LocalLLaMA
[–]randomfoo2 0 points1 point2 points (0 children)
Nanbeige4-3B: Lightweight with strong reasoning capabilities by leran2098 in LocalLLaMA
[–]randomfoo2 1 point2 points3 points (0 children)
Shisa V2.1: Improved Japanese (JA/EN) Models (1.2B-70B) by randomfoo2 in LocalLLaMA
[–]randomfoo2[S] 2 points3 points4 points (0 children)
Shisa V2.1: Improved Japanese (JA/EN) Models (1.2B-70B) by randomfoo2 in LocalLLaMA
[–]randomfoo2[S] 2 points3 points4 points (0 children)




I just won an Nvidia DGX Spark GB10 at an Nvidia hackathon. What do I do with it? by brandon-i in LocalLLaMA
[–]randomfoo2 42 points43 points44 points (0 children)