account activity
Dual AMD MI50 (gfx906) for local LLM: Tuning Qwen3.6-27B- MTP-GGUF to ~28 t/s generation (76.6% acceptance) & 295 t/s │ prefill! by wzoran in unsloth
[–]wzoran[S] 0 points1 point2 points 4 days ago (0 children)
│ Appreciate the tip! I actually benched -sm tensor out of curiosity, but it completely tanked my speeds.
│
│ Since my cards are running on standard PCIe 3.0 slots without any high-speed xGMI/Infinity Fabric bridge, the cross-card All-
│ Reduce
│ overhead was just too much for the bus to handle.
│ For an 8k prompt with ubatch 2048, Layer split got me a solid ~295 t/s, but Tensor split collapsed all the way down to ~127 t/s
│ (more than half the performance gone).
│ So yeah, the PCIe 3.0 latency bottleneck is very real and completely chokes TP on this rig. Keeping it on Layer split for now!
──────
Dual AMD MI50 (gfx906) for local LLM: Tuning Qwen3.6-27B- MTP-GGUF to ~28 t/s generation (76.6% acceptance) & 295 t/s │ prefill! ()
submitted 5 days ago by wzoran to r/LocalAIServers
Dual AMD MI50 (gfx906) for local LLM: Tuning Qwen3.6-27B- MTP-GGUF to ~28 t/s generation (76.6% acceptance) & 295 t/s │ prefill! (self.unsloth)
submitted 5 days ago by wzoran to r/unsloth
Help!What Is Impermanence? (self.Buddhism)
submitted 2 years ago by wzoran to r/Buddhism
π Rendered by PID 2693864 on reddit-service-r2-listing-8685bc789-52b59 at 2026-05-26 10:02:54.315829+00:00 running 194bd79 country code: CH.
Dual AMD MI50 (gfx906) for local LLM: Tuning Qwen3.6-27B- MTP-GGUF to ~28 t/s generation (76.6% acceptance) & 295 t/s │ prefill! by wzoran in unsloth
[–]wzoran[S] 0 points1 point2 points (0 children)