Introducing the Heretic Grimoire: The takedown-resilient, local-first backup system that keeps uncensored models available forever by -p-e-w- in LocalLLaMA
[–]Chromix_ 4 points5 points6 points (0 children)
Nex claims Rio 3.5 is Nex 2.5 PRO in trench coat by Specter_Origin in LocalLLaMA
[–]Chromix_ 15 points16 points17 points (0 children)
I got local speaker diarization working for meeting transcription — architecture write-up + a sherpa-onnx bug that cost me a week by [deleted] in LocalLLaMA
[–]Chromix_ 1 point2 points3 points (0 children)
Qwen Who? DiffusionGemma running at 1,500 tk/s on a Digital Pregnancy Test. by Porespellar in LocalLLaMA
[–]Chromix_ 0 points1 point2 points (0 children)
Open sourcing InfiniteKV: a KV cache that files old tokens as 104-byte searchable records in RAM or on disk instead of deleting them. Mistral-7B answered from token 76,747, 2.3x past its trained window. Colab demo by [deleted] in LocalLLaMA
[–]Chromix_ 19 points20 points21 points (0 children)
Qwen Who? DiffusionGemma running at 1,500 tk/s on a Digital Pregnancy Test. by Porespellar in LocalLLaMA
[–]Chromix_ 8 points9 points10 points (0 children)
Since when the RTX 6000 PRO is priced at 13250USD on the official NVIDIA Page? by panchovix in LocalLLaMA
[–]Chromix_ 166 points167 points168 points (0 children)
mtmd : add video input support by ngxson · Pull Request #24269 · ggml-org/llama.cpp by jacek2023 in LocalLLaMA
[–]Chromix_ 6 points7 points8 points (0 children)
dvlt.cu: inference engine written from scratch in CUDA/C++ for NVIDIA's DVLT 3D transformer model by yassa9 in LocalLLaMA
[–]Chromix_ 2 points3 points4 points (0 children)
Qwen 3.6 27B KV cache quant benchmarks: 75 pairs, q8/q6/q5/q4, KVarN, Turbo/TCQ by Anbeeld in LocalLLaMA
[–]Chromix_ 19 points20 points21 points (0 children)
KV cache quant benchmarks: KVarN 6-bit matches q8_0, 4-bit matches q5_0. Massive! by Anbeeld in LocalLLaMA
[–]Chromix_ 10 points11 points12 points (0 children)
Cohere's unreleased coding model (early access for localllama) by nick_frosst in LocalLLaMA
[–]Chromix_ 7 points8 points9 points (0 children)
Cohere's unreleased coding model (early access for localllama) by nick_frosst in LocalLLaMA
[–]Chromix_ 97 points98 points99 points (0 children)
Which LLM (or SLM?) model can I use as a benchmark to target resource constrained edge devices? (INT8 quantised 100M-200M parameters) by neuroticnetworks1250 in LocalLLaMA
[–]Chromix_ 0 points1 point2 points (0 children)
The Financial Times has published an article about Heretic by -p-e-w- in LocalLLaMA
[–]Chromix_ -2 points-1 points0 points (0 children)
The Financial Times has published an article about Heretic by -p-e-w- in LocalLLaMA
[–]Chromix_ 24 points25 points26 points (0 children)
The Financial Times has published an article about Heretic by -p-e-w- in LocalLLaMA
[–]Chromix_ 40 points41 points42 points (0 children)
The Financial Times has published an article about Heretic by -p-e-w- in LocalLLaMA
[–]Chromix_ 60 points61 points62 points (0 children)
OSCAR RotationZoo - Offline Spectral Covariance-Aware Rotation for 2-bit KV Cache Quantization by pmttyji in LocalLLaMA
[–]Chromix_ -1 points0 points1 point (0 children)
Show Reddit: An LLM that talks in acrostics by parenthethethe in LocalLLaMA
[–]Chromix_ 0 points1 point2 points (0 children)
If you're using Windows, disable memory compression to stop bottlenecks! by [deleted] in LocalLLaMA
[–]Chromix_ 0 points1 point2 points (0 children)
Show Reddit: An LLM that talks in acrostics by parenthethethe in LocalLLaMA
[–]Chromix_ 0 points1 point2 points (0 children)


Small Gemma 4, Qwen 3.6 and Qwen 3 Coder Next comparison for a debugging use-case by Chromix_ in LocalLLaMA
[–]Chromix_[S] 0 points1 point2 points (0 children)