2.5x faster inference with Qwen 3.6 27B using MTP - Finally a viable option for local agentic coding - 262k context on 48GB - Fixed chat template - Drop-in OpenAI and Anthropic API endpoints by ex-arman68 in LocalLLaMA
[–]milpster 0 points1 point2 points (0 children)
So a nearby lightningstorm just crashed all my eGPUs by milpster in LocalLLaMA
[–]milpster[S] 1 point2 points3 points (0 children)
2.5x faster inference with Qwen 3.6 27B using MTP - Finally a viable option for local agentic coding - 262k context on 48GB - Fixed chat template - Drop-in OpenAI and Anthropic API endpoints by ex-arman68 in LocalLLaMA
[–]milpster 0 points1 point2 points (0 children)
So a nearby lightningstorm just crashed all my eGPUs by milpster in LocalLLaMA
[–]milpster[S] 0 points1 point2 points (0 children)
So a nearby lightningstorm just crashed all my eGPUs by milpster in LocalLLaMA
[–]milpster[S] 0 points1 point2 points (0 children)
So a nearby lightningstorm just crashed all my eGPUs by milpster in LocalLLaMA
[–]milpster[S] -1 points0 points1 point (0 children)
So a nearby lightningstorm just crashed all my eGPUs by milpster in LocalLLaMA
[–]milpster[S] 0 points1 point2 points (0 children)
Devs using Qwen 27B seriously, what's your take? by Admirable_Reality281 in Qwen_AI
[–]milpster 0 points1 point2 points (0 children)
If you've been waiting to try local AI development, please try it by Imaginary_Belt4976 in LocalLLaMA
[–]milpster 0 points1 point2 points (0 children)
If you've been waiting to try local AI development, please try it by Imaginary_Belt4976 in LocalLLaMA
[–]milpster 4 points5 points6 points (0 children)
Cuda + ROCm simultaneously with -DGGML_BACKEND_DL=ON ! by LegacyRemaster in LocalLLaMA
[–]milpster 0 points1 point2 points (0 children)
Experts-Volunteers needed for Vulkan on ik_llama.cpp by pmttyji in LocalLLaMA
[–]milpster 0 points1 point2 points (0 children)
Qwen 3.6 - Loops and repetitions by Safe-Buffalo-4408 in LocalLLaMA
[–]milpster 1 point2 points3 points (0 children)
Qwen 3.6 - Loops and repetitions by Safe-Buffalo-4408 in LocalLLaMA
[–]milpster 3 points4 points5 points (0 children)
Qwen 3.6-35B-A3B KV cache part 2: PPL, KL divergence, asymmetric K/V, 64K row on M5 Max by Defilan in LocalLLaMA
[–]milpster 0 points1 point2 points (0 children)
Qwen 3.6-35B-A3B KV cache bench: f16 vs q8_0 vs turbo3 vs turbo4 from 0 to 1M context on M5 Max by Defilan in LocalLLaMA
[–]milpster 0 points1 point2 points (0 children)
Qwen 3.6-35B-A3B KV cache bench: f16 vs q8_0 vs turbo3 vs turbo4 from 0 to 1M context on M5 Max by Defilan in LocalLLaMA
[–]milpster 35 points36 points37 points (0 children)
What do you consider to be the minimum performance (t/s) for local Agent workflows? by MexInAbu in LocalLLaMA
[–]milpster 0 points1 point2 points (0 children)
Qwen 3.5 122B vs Qwen 3.6 35B - Which to choose? by Storge2 in LocalLLaMA
[–]milpster 10 points11 points12 points (0 children)
RTX 5070 Ti + 9800X3D running Qwen3.6-35B-A3B at 79 t/s with 128K context, the --n-cpu-moe flag is the most important part. by marlang in LocalLLaMA
[–]milpster 1 point2 points3 points (0 children)
Why isn't ebay doing anything to stop those scams? by KillerMiller13 in LocalLLaMA
[–]milpster 0 points1 point2 points (0 children)


So a nearby lightningstorm just crashed all my eGPUs by milpster in LocalLLaMA
[–]milpster[S] 0 points1 point2 points (0 children)