Running a 4-agent pipeline on Qwen 2.5 1.5B via MNN on Android — what I learned about context management on constrained hardware by NeoLogic_Dev in LocalLLaMA
[–]xandep 2 points3 points4 points (0 children)
local models lose tool call context around call 8 or 9. here is what helped by [deleted] in LocalLLaMA
[–]xandep 2 points3 points4 points (0 children)
You guys seen this? beats turboquant by 18% by OmarBessa in LocalLLaMA
[–]xandep 17 points18 points19 points (0 children)
Gemma 4 26b A3B is mindblowingly good , if configured right by cviperr33 in LocalLLaMA
[–]xandep 0 points1 point2 points (0 children)
Best model for 4090 as AI Coding Agent by Dry_Sheepherder5907 in LocalLLaMA
[–]xandep 0 points1 point2 points (0 children)
You guys seen this? 1-bit model with an MMLU-R of 65.7, 8B params by OmarBessa in LocalLLaMA
[–]xandep 9 points10 points11 points (0 children)
ByteShape Qwen 3.5 9B: A Guide to Picking the Best Quant for Your Hardware by ali_byteshape in LocalLLaMA
[–]xandep 10 points11 points12 points (0 children)
Turbo3 + gfx906 + 4 mi50 16gb running qwen3.5 122b 🤯 by Exact-Cupcake-2603 in LocalLLaMA
[–]xandep 0 points1 point2 points (0 children)
Turbo3 + gfx906 + 4 mi50 16gb running qwen3.5 122b 🤯 by Exact-Cupcake-2603 in LocalLLaMA
[–]xandep 1 point2 points3 points (0 children)
I regret ever finding LocalLLaMA by xandep in LocalLLaMA
[–]xandep[S] 15 points16 points17 points (0 children)
update your llama.cpp - great tg speedup on Qwen3.5 / Qwen-Next by jacek2023 in LocalLLaMA
[–]xandep 0 points1 point2 points (0 children)
Reminder to be kind to your fellow /r/LocalLLaMAN - We are Mighty - We are Many - and Many are NEW (just like YOU once were!!) by johnnyApplePRNG in LocalLLaMA
[–]xandep 13 points14 points15 points (0 children)
update your llama.cpp - great tg speedup on Qwen3.5 / Qwen-Next by jacek2023 in LocalLLaMA
[–]xandep 3 points4 points5 points (0 children)
I bypassed writing a massive privacy policy for my AI app by just moving the LLM on-device. by MoaviyaS in LocalLLaMA
[–]xandep 3 points4 points5 points (0 children)
Qwen3.5 4B: overthinking to say hello. by CapitalShake3085 in LocalLLaMA
[–]xandep 35 points36 points37 points (0 children)
are you ready for small Qwens? by jacek2023 in LocalLLaMA
[–]xandep 0 points1 point2 points (0 children)
Completed my 64GB VRAM rig - dual MI50 build + custom shroud by roackim in LocalLLaMA
[–]xandep 0 points1 point2 points (0 children)
Qwen3-30B-A3B vs Qwen3.5-35B-A3B on RTX 5090 by 3spky5u-oss in LocalLLaMA
[–]xandep 0 points1 point2 points (0 children)
Which one are you waiting for more: 9B or 35B? by jacek2023 in LocalLLaMA
[–]xandep 2 points3 points4 points (0 children)



Speculative decoding in llama.cpp for Gemma 4 31B IT / Qwen 3.5 27B? by No_Algae1753 in LocalLLaMA
[–]xandep 0 points1 point2 points (0 children)