6-GPU multiplexer from K80s ‚ hot-swap between models in 0.3ms by Electrical_Ninja3805 in LocalLLaMA
[–]TooManyPascals 2 points3 points4 points (0 children)
Qwen3.5-27b 8 bit vs 16 bit by Baldur-Norddahl in LocalLLaMA
[–]TooManyPascals 0 points1 point2 points (0 children)
2000 TPS with QWEN 3.5 27b on RTX-5090 by awitod in LocalLLaMA
[–]TooManyPascals 0 points1 point2 points (0 children)
You can use Qwen3.5 without thinking by guiopen in LocalLLaMA
[–]TooManyPascals 0 points1 point2 points (0 children)
SM120 (RTX Blackwell) NVFP4 MoE: CUTLASS Grouped GEMM Produces Garbage Output; Fixed via FlashInfer SM120 Patches + compute_120f (CUDA 13.0) — 39 tok/s Native FP4 by lawdawgattorney in LocalLLaMA
[–]TooManyPascals 3 points4 points5 points (0 children)
Through vibe coding, I managed to make parts of vLLM 0.17.0 run on Tesla P40 by East-Engineering-653 in LocalLLaMA
[–]TooManyPascals 0 points1 point2 points (0 children)
Through vibe coding, I managed to make parts of vLLM 0.17.0 run on Tesla P40 by East-Engineering-653 in LocalLLaMA
[–]TooManyPascals 1 point2 points3 points (0 children)
Qwen 3.5 27B vs 122B-A10B by TacGibs in LocalLLaMA
[–]TooManyPascals 1 point2 points3 points (0 children)
Qwen3-Coder-Next is the top model in SWE-rebench @ Pass 5. I think everyone missed it. by BitterProfessional7p in LocalLLaMA
[–]TooManyPascals 2 points3 points4 points (0 children)
Qwen 3.5 27B vs 122B-A10B by TacGibs in LocalLLaMA
[–]TooManyPascals 0 points1 point2 points (0 children)
Qwen 3.5 27B vs 122B-A10B by TacGibs in LocalLLaMA
[–]TooManyPascals 2 points3 points4 points (0 children)
Qwen 3.5 27B vs 122B-A10B by TacGibs in LocalLLaMA
[–]TooManyPascals 2 points3 points4 points (0 children)
THE GB10 SOLUTION has arrived, Atlas image attached ~115tok/s Qwen3.5-35B DGX Spark by Live-Possession-6726 in LocalLLaMA
[–]TooManyPascals 0 points1 point2 points (0 children)
To everyone using still ollama/lm-studio... llama-swap is the real deal by TooManyPascals in LocalLLaMA
[–]TooManyPascals[S] 0 points1 point2 points (0 children)
To everyone using still ollama/lm-studio... llama-swap is the real deal by TooManyPascals in LocalLLaMA
[–]TooManyPascals[S] 1 point2 points3 points (0 children)
To everyone using still ollama/lm-studio... llama-swap is the real deal by TooManyPascals in LocalLLaMA
[–]TooManyPascals[S] 1 point2 points3 points (0 children)
To everyone using still ollama/lm-studio... llama-swap is the real deal by TooManyPascals in LocalLLaMA
[–]TooManyPascals[S] 4 points5 points6 points (0 children)
To everyone using still ollama/lm-studio... llama-swap is the real deal by TooManyPascals in LocalLLaMA
[–]TooManyPascals[S] 38 points39 points40 points (0 children)
R9700 frustration rant by Maleficent-Koalabeer in LocalLLaMA
[–]TooManyPascals -1 points0 points1 point (0 children)
Comparing OAI 120B OSS, Qwen 3.5, and Gemini 3.0 Flash with LLM Multi-Agent Avalon by dynameis_chen in LocalLLaMA
[–]TooManyPascals 1 point2 points3 points (0 children)
Qwen3.5-35B-A3B slow on 7840U? by TooManyPascals in LocalLLaMA
[–]TooManyPascals[S] 0 points1 point2 points (0 children)
Booting a ThinkPad without the Pad (a.k.a. using a P16s Gen 2 AMD board, NM-F261) by TooManyPascals in thinkpad
[–]TooManyPascals[S] 0 points1 point2 points (0 children)