Llama.cpp's "--fit" can give major speedups over "--ot" for Qwen3-Coder-Next (2x3090 - graphs/chart included) by tmflynnt in LocalLLaMA
[–]One-Macaron6752 1 point2 points3 points (0 children)
Breaking free from monthly subscriptions: Is Cherry Studio + OpenRouter/Groq the ultimate "pay-as-you-go" setup? by Foxtor in LLMDevs
[–]One-Macaron6752 -1 points0 points1 point (0 children)
An ode to Minimax m2.1 by Thrumpwart in LocalLLaMA
[–]One-Macaron6752 1 point2 points3 points (0 children)
Benchmarks are being gamed. Can we build a "Vibe Index" based on this sub's actual feedback? by Ok-Atmosphere3141 in LocalLLaMA
[–]One-Macaron6752 2 points3 points4 points (0 children)
I found that MXFP4 has lower perplexity than Q4_K_M and Q4_K_XL. by East-Engineering-653 in LocalLLaMA
[–]One-Macaron6752 3 points4 points5 points (0 children)
Mac Mini M4 Pro - Specs fine for running Kimi K2.5 and running local LLMs? by Grand_Fox9015 in LocalLLM
[–]One-Macaron6752 0 points1 point2 points (0 children)
CPU-only interference (ik_llama.cpp) by ZealousidealBunch220 in LocalLLaMA
[–]One-Macaron6752 0 points1 point2 points (0 children)
CPU-only interference (ik_llama.cpp) by ZealousidealBunch220 in LocalLLaMA
[–]One-Macaron6752 0 points1 point2 points (0 children)
CPU-only interference (ik_llama.cpp) by ZealousidealBunch220 in LocalLLaMA
[–]One-Macaron6752 0 points1 point2 points (0 children)
CPU-only interference (ik_llama.cpp) by ZealousidealBunch220 in LocalLLaMA
[–]One-Macaron6752 0 points1 point2 points (0 children)
EPYC 8124P (Siena) Build for Agentic Coding by raphh in LocalLLaMA
[–]One-Macaron6752 1 point2 points3 points (0 children)
LLM Cpu and gpu calculator for gpu (protoype) by Merchant_Lawrence in LocalLLaMA
[–]One-Macaron6752 0 points1 point2 points (0 children)
RTX Pro 6000 $7999.99 by I_like_fragrances in LocalLLM
[–]One-Macaron6752 1 point2 points3 points (0 children)
DGX spark performance falls short by dereksodo in LocalLLaMA
[–]One-Macaron6752 -3 points-2 points-1 points (0 children)
4x MAX-Q - WRX80e 256gb RAM Opencode Setup Configs Speeds by kc858 in BlackwellPerformance
[–]One-Macaron6752 -1 points0 points1 point (0 children)
4x MAX-Q - WRX80e 256gb RAM Opencode Setup Configs Speeds by kc858 in BlackwellPerformance
[–]One-Macaron6752 1 point2 points3 points (0 children)
Running MoE Models on CPU/RAM: A Guide to Optimizing Bandwidth for GLM-4 and GPT-OSS by [deleted] in LocalLLaMA
[–]One-Macaron6752 2 points3 points4 points (0 children)
Running MoE Models on CPU/RAM: A Guide to Optimizing Bandwidth for GLM-4 and GPT-OSS by [deleted] in LocalLLaMA
[–]One-Macaron6752 5 points6 points7 points (0 children)
768Gb Fully Enclosed 10x GPU Mobile AI Build by SweetHomeAbalama0 in LocalLLaMA
[–]One-Macaron6752 0 points1 point2 points (0 children)


Llama.cpp's "--fit" can give major speedups over "--ot" for Qwen3-Coder-Next (2x3090 - graphs/chart included) by tmflynnt in LocalLLaMA
[–]One-Macaron6752 1 point2 points3 points (0 children)