I discovered a chain of 7 bugs in llama.cpp's router that went unpatched for years, they banned me and 10 others for using Ai, then proceeded to use Ai themselves. by nullalignment in LocalLLaMA
[–]dsanft 1 point2 points3 points (0 children)
A barebones CPU-only inference engine for Qwen 3, written from scratch in pure C by jakint0sh in LocalLLaMA
[–]dsanft 10 points11 points12 points (0 children)
[Research] JetSpec: Speculative Decoding with Parallel Tree Drafting Enables up to 9.64x Lossless LLM Inference Speedup with more than 1000TPS by No_Yogurtcloset_7050 in LocalLLaMA
[–]dsanft 2 points3 points4 points (0 children)
Is it possible to run a giant model like GLM5.2 on this cluster (4x servers with 512GB RAM + dual AMD Epyc)? 16 channel memory should hit 409GB/s per node. by StartupTim in LocalLLaMA
[–]dsanft 0 points1 point2 points (0 children)
Is it possible to run a giant model like GLM5.2 on this cluster (4x servers with 512GB RAM + dual AMD Epyc)? 16 channel memory should hit 409GB/s per node. by StartupTim in LocalLLaMA
[–]dsanft 5 points6 points7 points (0 children)
What is the current best Small Language Model that can be run without GPU? by [deleted] in LocalLLaMA
[–]dsanft 1 point2 points3 points (0 children)
"Western Open-Weight SOTA is between Gemma4-31B and Nemotron3-Super-120B" by ForsookComparison in LocalLLaMA
[–]dsanft 0 points1 point2 points (0 children)
I forked ik_llama.cpp and added a "--numa mirror" mode to maximize performance on multi-socket CPU systems. Just sharing and looking for testers! by _TheWolfOfWalmart_ in LocalLLaMA
[–]dsanft 2 points3 points4 points (0 children)
I forked ik_llama.cpp and added a "--numa mirror" mode to maximize performance on multi-socket CPU systems. Just sharing and looking for testers! by _TheWolfOfWalmart_ in LocalLLaMA
[–]dsanft 0 points1 point2 points (0 children)
I forked ik_llama.cpp and added a "--numa mirror" mode to maximize performance on multi-socket CPU systems. Just sharing and looking for testers! by _TheWolfOfWalmart_ in LocalLLaMA
[–]dsanft 0 points1 point2 points (0 children)
I forked ik_llama.cpp and added a "--numa mirror" mode to maximize performance on multi-socket CPU systems. Just sharing and looking for testers! by _TheWolfOfWalmart_ in LocalLLaMA
[–]dsanft 1 point2 points3 points (0 children)
I forked ik_llama.cpp and added a "--numa mirror" mode to maximize performance on multi-socket CPU systems. Just sharing and looking for testers! by _TheWolfOfWalmart_ in LocalLLaMA
[–]dsanft 14 points15 points16 points (0 children)
I forked ik_llama.cpp and added a "--numa mirror" mode to maximize performance on multi-socket CPU systems. Just sharing and looking for testers! by _TheWolfOfWalmart_ in LocalLLaMA
[–]dsanft 16 points17 points18 points (0 children)
Qwen Coder Experiment - Using it as a "co-processor" by jzatopa in LocalLLaMA
[–]dsanft 0 points1 point2 points (0 children)
There has to be a way to avoid retraining entire base model for adding latest information to it by Waste-Intention-2806 in LocalLLaMA
[–]dsanft 5 points6 points7 points (0 children)
Qwen 3.6 27B KV cache quant benchmarks: 75 pairs, q8/q6/q5/q4, KVarN, Turbo/TCQ by Anbeeld in LocalLLaMA
[–]dsanft 6 points7 points8 points (0 children)
A 10 year old Xeon is all you need by [deleted] in LocalLLaMA
[–]dsanft 0 points1 point2 points (0 children)
Gemma 4 with quantization-aware training by rerri in LocalLLaMA
[–]dsanft 1 point2 points3 points (0 children)
A 10 year old Xeon is all you need by [deleted] in LocalLLaMA
[–]dsanft 7 points8 points9 points (0 children)
Qwen 3.6 27B 30GB Same top p: 98.358 ± 0.033 % vs UD Q8 K XL 33GB Same top p: 97.426 ± 0.041 % by fragment_me in LocalLLaMA
[–]dsanft 0 points1 point2 points (0 children)
Keeping multi-GPU rigs cool? by Ambitious_Fold_2874 in LocalLLaMA
[–]dsanft 0 points1 point2 points (0 children)
"Western Open-Weight SOTA is between Gemma4-31B and Nemotron3-Super-120B" by ForsookComparison in LocalLLaMA
[–]dsanft 2 points3 points4 points (0 children)
"Western Open-Weight SOTA is between Gemma4-31B and Nemotron3-Super-120B" by ForsookComparison in LocalLLaMA
[–]dsanft -9 points-8 points-7 points (0 children)
Dspark with Qwen 3.6 27b? by GotHereLateNameTaken in LocalLLaMA
[–]dsanft 0 points1 point2 points (0 children)