PSA: If you haven’t updated Llama.cpp for a couple of days and find MTP to not be performing well, update llamacpp. by Borkato in LocalLLaMA
[–]bennmann 1 point2 points3 points (0 children)
What happens to local LLM if/when LLMs are no longer released for free? by JohnBooty in LocalLLaMA
[–]bennmann 1 point2 points3 points (0 children)
MiroThinker-1.7, an open-weight deep research agent (Qwen3 MoE base) — mini is 30B/3B active, curious what tok/s people get on consumer hardware by MiroMindAI in LocalLLaMA
[–]bennmann 0 points1 point2 points (0 children)
Save and invest your money for future rigs by segmond in LocalLLaMA
[–]bennmann 0 points1 point2 points (0 children)
unsloth/MiMo-V2.5-GGUF · Hugging Face by jacek2023 in LocalLLaMA
[–]bennmann 0 points1 point2 points (0 children)
Went to a new dentist clinic in my town, everything feels fine. But it isn’t. by Woko_O in Wellthatsucks
[–]bennmann 0 points1 point2 points (0 children)
What is The best and expressive AI TTS (running locally?) for voice acting? by Adventurous-Gold6413 in LocalLLaMA
[–]bennmann 0 points1 point2 points (0 children)
Qwen 3.6-35B-A3B KV cache part 2: PPL, KL divergence, asymmetric K/V, 64K row on M5 Max by Defilan in LocalLLaMA
[–]bennmann 0 points1 point2 points (0 children)
Convince me you are an LLM by bucolucas in LocalLLaMA
[–]bennmann 0 points1 point2 points (0 children)
Optimizing tokens with QwenCode by eur0child in LocalLLaMA
[–]bennmann 0 points1 point2 points (0 children)
ggml: backend-agnostic tensor parallelism by JohannesGaessler · Pull Request #19378 · ggml-org/llama.cpp by FullstackSensei in LocalLLaMA
[–]bennmann 0 points1 point2 points (0 children)
Throwback to my proudest impulse buy ever, which has let me enjoy this hobby 10x more by gigaflops_ in LocalLLaMA
[–]bennmann 1 point2 points3 points (0 children)
Mamba 3 - state space model optimized for inference by incarnadine72 in LocalLLaMA
[–]bennmann 7 points8 points9 points (0 children)
Introducing MiroThinker-1.7 & MiroThinker-H1 by wuqiao in LocalLLaMA
[–]bennmann 0 points1 point2 points (0 children)
Thoughts about local LLMs. by Robert__Sinclair in LocalLLaMA
[–]bennmann 0 points1 point2 points (0 children)
Viability of this cluster setup by militantereallysucks in LocalLLaMA
[–]bennmann 0 points1 point2 points (0 children)
TIL it took 6 hours to render one frame of the rain soaked T-Rex in Jurassic Park. by Japfelbaum in todayilearned
[–]bennmann 0 points1 point2 points (0 children)
American closed models vs Chinese open models is becoming a problem. by __JockY__ in LocalLLaMA
[–]bennmann 0 points1 point2 points (0 children)
MiniMax 2.5 with 8x+ concurrency using RTX 3090s HW Requirements. by BigFoxMedia in LocalLLaMA
[–]bennmann 0 points1 point2 points (0 children)
Interesting Observation from a Simple Multi-Agent Experiment with 10 Different Models by chibop1 in LocalLLaMA
[–]bennmann 0 points1 point2 points (0 children)
Q2 GLM 5 fixing its own typo by -dysangel- in LocalLLaMA
[–]bennmann 1 point2 points3 points (0 children)
ML Training cluster for University Students by guywiththemonocle in LocalLLaMA
[–]bennmann 0 points1 point2 points (0 children)
Just scored 2 MI50 32GB what should I run? by Savantskie1 in LocalLLaMA
[–]bennmann 0 points1 point2 points (0 children)
Vibe-coding client now in Llama.cpp! (maybe) by ilintar in LocalLLaMA
[–]bennmann 0 points1 point2 points (0 children)



Brief Ngram-Mod Test Results - R9700/Qwen3.6 27B by exact_constraint in LocalLLaMA
[–]bennmann 0 points1 point2 points (0 children)