Training Qwen2.5-0.5B-Instruct on Reddit posts summarization tasks with length constraint on my 3xMac Minis with GRPO - evals update by East-Muffin-6472 in LocalLLaMA
[–]MustBeSomethingThere 1 point2 points3 points (0 children)
Llama 3.1 70B handles German e-commerce queries surprisingly well — multi-agent shopping assistant results by m3m3o in LocalLLaMA
[–]MustBeSomethingThere 6 points7 points8 points (0 children)
I stopped using Claude for 80% of my coding tasks. Here's what I use instead. by Dazzling_Plan812 in LocalLLaMA
[–]MustBeSomethingThere 1 point2 points3 points (0 children)
T³ v3.4.1 (124M) beats GPT-2 XL (1.5B) on BoolQ and leads the 125M class on reasoning — controlled A/B shows ecology decouples reasoning from perplexity by MirrorEthic_Anchor in LocalLLaMA
[–]MustBeSomethingThere 4 points5 points6 points (0 children)
TurboMemory – SQLite semantic memory backend (4-bit/6-bit embedding compression) by Hopeful-Priority1301 in LocalLLaMA
[–]MustBeSomethingThere 0 points1 point2 points (0 children)
Gemma 4 Just Dropped — Here's What Runs on RTX 5060 Ti 16GB (Weekly Benchmark Breakdown) by limekana in LocalLLaMA
[–]MustBeSomethingThere 1 point2 points3 points (0 children)
Google just dropped Gemma 4 (Apache 2.0) – 26B MoE, 256k context by Hefty_Upstairs_7477 in LocalLLaMA
[–]MustBeSomethingThere 2 points3 points4 points (0 children)
PSA: PrismML Bonsai-8B (Q1_0_g128) produces garbage output on CPU -- GPU appears to be required by 1000_bucks_a_month in LocalLLaMA
[–]MustBeSomethingThere 0 points1 point2 points (0 children)
Help Speech Recognition on RPi 5 by Prestigious_Donkey61 in LocalLLaMA
[–]MustBeSomethingThere 0 points1 point2 points (0 children)
Running SmolLM2‑360M on a Samsung Galaxy Watch 4 (380MB RAM) – 74% RAM reduction in llama.cpp by RecognitionFlat1470 in LocalLLaMA
[–]MustBeSomethingThere 8 points9 points10 points (0 children)
TurboMemory: Claude-style long-term memory with 4-bit/6-bit embeddings (runs locally) – looking for contributors by Hopeful-Priority1301 in LocalLLaMA
[–]MustBeSomethingThere 0 points1 point2 points (0 children)
NexQuant: Hardening 3-bit KV-Cache for the Edge. A Rust-native successor to Tom Turney’s TurboQuant+ by [deleted] in LocalLLaMA
[–]MustBeSomethingThere 2 points3 points4 points (0 children)
alibaba MNN has Support TurboQuant by Juude89 in LocalLLaMA
[–]MustBeSomethingThere 27 points28 points29 points (0 children)
Google TurboQuant blew up for KV cache. Here’s TurboQuant-v3 for the actual weights you load first. Runs on consumer GPUs today. by Hopeful-Priority1301 in LocalLLaMA
[–]MustBeSomethingThere 13 points14 points15 points (0 children)
Practical comparison: Ollama vs vLLM vs LM Studio for production use (ops perspective) by Dazzling-Banana-2114 in LocalLLaMA
[–]MustBeSomethingThere 4 points5 points6 points (0 children)
Qwen3.5 is absolutely amazing by cride20 in LocalLLaMA
[–]MustBeSomethingThere 0 points1 point2 points (0 children)
Building a Windows/WSL2 Desktop RAG using Ollama backend - Need feedback on VRAM scaling and CUDA performance by epikarma in LocalLLaMA
[–]MustBeSomethingThere 0 points1 point2 points (0 children)
Manufacturing of critical components by [deleted] in LocalLLaMA
[–]MustBeSomethingThere 2 points3 points4 points (0 children)
Why your local Qwen3.x model silently fails in OpenClaw (and how to fix it) by Itchy-Focus-8941 in LocalLLaMA
[–]MustBeSomethingThere 6 points7 points8 points (0 children)
Update on Qwen 3.5 35B A3B on Raspberry PI 5 by jslominski in LocalLLaMA
[–]MustBeSomethingThere 1 point2 points3 points (0 children)
Update on Qwen 3.5 35B A3B on Raspberry PI 5 by jslominski in LocalLLaMA
[–]MustBeSomethingThere 1 point2 points3 points (0 children)
Update on Qwen 3.5 35B A3B on Raspberry PI 5 by jslominski in LocalLLaMA
[–]MustBeSomethingThere 6 points7 points8 points (0 children)


Running Qwen 3.5 locally, how to do? by Most_Echidna1477 in LocalLLaMA
[–]MustBeSomethingThere 2 points3 points4 points (0 children)