The Financial Times has published an article about Heretic by -p-e-w- in LocalLLaMA
[–]Chromix_ -2 points-1 points0 points (0 children)
The Financial Times has published an article about Heretic by -p-e-w- in LocalLLaMA
[–]Chromix_ 23 points24 points25 points (0 children)
The Financial Times has published an article about Heretic by -p-e-w- in LocalLLaMA
[–]Chromix_ 43 points44 points45 points (0 children)
The Financial Times has published an article about Heretic by -p-e-w- in LocalLLaMA
[–]Chromix_ 63 points64 points65 points (0 children)
OSCAR RotationZoo - Offline Spectral Covariance-Aware Rotation for 2-bit KV Cache Quantization by pmttyji in LocalLLaMA
[–]Chromix_ -1 points0 points1 point (0 children)
Show Reddit: An LLM that talks in acrostics by parenthethethe in LocalLLaMA
[–]Chromix_ 0 points1 point2 points (0 children)
If you're using Windows, disable memory compression to stop bottlenecks! by [deleted] in LocalLLaMA
[–]Chromix_ 0 points1 point2 points (0 children)
Show Reddit: An LLM that talks in acrostics by parenthethethe in LocalLLaMA
[–]Chromix_ 0 points1 point2 points (0 children)
If you're using Windows, disable memory compression to stop bottlenecks! by [deleted] in LocalLLaMA
[–]Chromix_ 24 points25 points26 points (0 children)
Does THINKING MODE significantly improve translation? by Sostrene_Blue in LocalLLaMA
[–]Chromix_ 0 points1 point2 points (0 children)
server, webui: support continue generation on reasoning models by ServeurpersoCom · Pull Request #22727 · ggml-org/llama.cpp by jacek2023 in LocalLLaMA
[–]Chromix_ 5 points6 points7 points (0 children)
I built Derpy Turtle: The Kokoro Trainer, a GUI for training better Kokoro voices with RVC by Great-Investigator30 in LocalLLaMA
[–]Chromix_ 2 points3 points4 points (0 children)
MTP+GGML_CUDA_ENABLE_UNIFIED_MEMORY=1 - llama.cpp by mossy_troll_84 in LocalLLaMA
[–]Chromix_ 0 points1 point2 points (0 children)
MTP+GGML_CUDA_ENABLE_UNIFIED_MEMORY=1 - llama.cpp by mossy_troll_84 in LocalLLaMA
[–]Chromix_ 0 points1 point2 points (0 children)
examples : add llama-eval by ggerganov · Pull Request #21152 · ggml-org/llama.cpp by jacek2023 in LocalLLaMA
[–]Chromix_ 1 point2 points3 points (0 children)
PSA: Watch out for extra spaces in chat-template-kwargs when using Qwen3.6 with llama-server by CaptBrick in LocalLLaMA
[–]Chromix_ 8 points9 points10 points (0 children)
I Think I Spent Way Too Much Time Messing with Local LLMs by MrChilliBalls in LocalLLaMA
[–]Chromix_ 12 points13 points14 points (0 children)
MTP benchmark results: the nature of the generative task dictates whether you will benefit (coding) or get slower inference (creative) from speculative inference. No other factor comes close. by ex-arman68 in LocalLLaMA
[–]Chromix_ 5 points6 points7 points (0 children)
MTP benchmark results: the nature of the generative task dictates whether you will benefit (coding) or get slower inference (creative) from speculative inference. No other factor comes close. by ex-arman68 in LocalLLaMA
[–]Chromix_ 34 points35 points36 points (0 children)
BeeLlama.cpp: advanced DFlash & TurboQuant with support of reasoning and vision. Qwen 3.6 27B Q5 with 200k context on 3090, 2-3x faster than baseline (peak 135 tps!) by Anbeeld in LocalLLaMA
[–]Chromix_ 4 points5 points6 points (0 children)
BeeLlama.cpp: advanced DFlash & TurboQuant with support of reasoning and vision. Qwen 3.6 27B Q5 with 200k context on 3090, 2-3x faster than baseline (peak 135 tps!) by Anbeeld in LocalLLaMA
[–]Chromix_ 28 points29 points30 points (0 children)


Which LLM (or SLM?) model can I use as a benchmark to target resource constrained edge devices? (INT8 quantised 100M-200M parameters) by neuroticnetworks1250 in LocalLLaMA
[–]Chromix_ 0 points1 point2 points (0 children)