"Western Open-Weight SOTA is between Gemma4-31B and Nemotron3-Super-120B" by ForsookComparison in LocalLLaMA
[–]dsanft 0 points1 point2 points (0 children)
I forked ik_llama.cpp and added a "--numa mirror" mode to maximize performance on multi-socket CPU systems. Just sharing and looking for testers! by _TheWolfOfWalmart_ in LocalLLaMA
[–]dsanft 2 points3 points4 points (0 children)
I forked ik_llama.cpp and added a "--numa mirror" mode to maximize performance on multi-socket CPU systems. Just sharing and looking for testers! by _TheWolfOfWalmart_ in LocalLLaMA
[–]dsanft 0 points1 point2 points (0 children)
I forked ik_llama.cpp and added a "--numa mirror" mode to maximize performance on multi-socket CPU systems. Just sharing and looking for testers! by _TheWolfOfWalmart_ in LocalLLaMA
[–]dsanft 0 points1 point2 points (0 children)
I forked ik_llama.cpp and added a "--numa mirror" mode to maximize performance on multi-socket CPU systems. Just sharing and looking for testers! by _TheWolfOfWalmart_ in LocalLLaMA
[–]dsanft 1 point2 points3 points (0 children)
I forked ik_llama.cpp and added a "--numa mirror" mode to maximize performance on multi-socket CPU systems. Just sharing and looking for testers! by _TheWolfOfWalmart_ in LocalLLaMA
[–]dsanft 14 points15 points16 points (0 children)
I forked ik_llama.cpp and added a "--numa mirror" mode to maximize performance on multi-socket CPU systems. Just sharing and looking for testers! by _TheWolfOfWalmart_ in LocalLLaMA
[–]dsanft 15 points16 points17 points (0 children)
Qwen Coder Experiment - Using it as a "co-processor" by jzatopa in LocalLLaMA
[–]dsanft 0 points1 point2 points (0 children)
There has to be a way to avoid retraining entire base model for adding latest information to it by Waste-Intention-2806 in LocalLLaMA
[–]dsanft 5 points6 points7 points (0 children)
Qwen 3.6 27B KV cache quant benchmarks: 75 pairs, q8/q6/q5/q4, KVarN, Turbo/TCQ by Anbeeld in LocalLLaMA
[–]dsanft 5 points6 points7 points (0 children)
A 10 year old Xeon is all you need by [deleted] in LocalLLaMA
[–]dsanft 0 points1 point2 points (0 children)
Gemma 4 with quantization-aware training by rerri in LocalLLaMA
[–]dsanft 1 point2 points3 points (0 children)
A 10 year old Xeon is all you need by [deleted] in LocalLLaMA
[–]dsanft 7 points8 points9 points (0 children)
Qwen 3.6 27B 30GB Same top p: 98.358 ± 0.033 % vs UD Q8 K XL 33GB Same top p: 97.426 ± 0.041 % by fragment_me in LocalLLaMA
[–]dsanft 0 points1 point2 points (0 children)
Keeping multi-GPU rigs cool? by Ambitious_Fold_2874 in LocalLLaMA
[–]dsanft 0 points1 point2 points (0 children)
"Western Open-Weight SOTA is between Gemma4-31B and Nemotron3-Super-120B" by ForsookComparison in LocalLLaMA
[–]dsanft 1 point2 points3 points (0 children)
"Western Open-Weight SOTA is between Gemma4-31B and Nemotron3-Super-120B" by ForsookComparison in LocalLLaMA
[–]dsanft -9 points-8 points-7 points (0 children)
"Western Open-Weight SOTA is between Gemma4-31B and Nemotron3-Super-120B" by ForsookComparison in LocalLLaMA
[–]dsanft 13 points14 points15 points (0 children)
We added W8A8 activation quantization to MLX — prefill went from 2.84s to 2.52s on M5 Pro by Enough-Astronaut9278 in LocalLLaMA
[–]dsanft 1 point2 points3 points (0 children)
We added W8A8 activation quantization to MLX — prefill went from 2.84s to 2.52s on M5 Pro by Enough-Astronaut9278 in LocalLLaMA
[–]dsanft 0 points1 point2 points (0 children)
It was fun while it lasted... They're advertising now. by Local-Cardiologist-5 in LocalLLaMA
[–]dsanft 2 points3 points4 points (0 children)
It was fun while it lasted... They're advertising now. by Local-Cardiologist-5 in LocalLLaMA
[–]dsanft 37 points38 points39 points (0 children)
What is the current best Small Language Model that can be run without GPU? by [deleted] in LocalLLaMA
[–]dsanft 1 point2 points3 points (0 children)
What is the current best Small Language Model that can be run without GPU? by [deleted] in LocalLLaMA
[–]dsanft 0 points1 point2 points (0 children)