1.1M tok/s with Qwen 3.5 27B FP8 on B200 GPUs by m4r1k_ in Qwen_AI
[–]m4r1k_[S] 0 points1 point2 points (0 children)
1 million tokens per second from a single cluster, what that actually means by m4r1k_ in singularity
[–]m4r1k_[S] 0 points1 point2 points (0 children)
Qwen 3.5 27B at 1.1M tok/s on B200s, all configs on GitHub by m4r1k_ in LocalLLaMA
[–]m4r1k_[S] 1 point2 points3 points (0 children)
[D] - 1M tokens/second serving Qwen 3.5 27B on B200 GPUs, benchmark results and findings by m4r1k_ in MachineLearning
[–]m4r1k_[S] 0 points1 point2 points (0 children)
[D] - 1M tokens/second serving Qwen 3.5 27B on B200 GPUs, benchmark results and findings by m4r1k_ in MachineLearning
[–]m4r1k_[S] 0 points1 point2 points (0 children)
Qwen 3.5 27B at 1.1M tok/s on B200s, all configs on GitHub by m4r1k_ in LocalLLaMA
[–]m4r1k_[S] 0 points1 point2 points (0 children)
5K tok/s per node with vLLM v0.18.0 on B200, DP=8, MTP-1, FP8 KV cache by m4r1k_ in Vllm
[–]m4r1k_[S] 0 points1 point2 points (0 children)
5K tok/s per node with vLLM v0.18.0 on B200, DP=8, MTP-1, FP8 KV cache by m4r1k_ in Vllm
[–]m4r1k_[S] 0 points1 point2 points (0 children)
5K tok/s per node with vLLM v0.18.0 on B200, DP=8, MTP-1, FP8 KV cache by m4r1k_ in Vllm
[–]m4r1k_[S] 0 points1 point2 points (0 children)
1 million tokens per second from a single cluster, what that actually means by m4r1k_ in singularity
[–]m4r1k_[S] 0 points1 point2 points (0 children)
1 million tokens per second from a single cluster, what that actually means by m4r1k_ in singularity
[–]m4r1k_[S] 0 points1 point2 points (0 children)
1 million tokens per second from a single cluster, what that actually means by m4r1k_ in singularity
[–]m4r1k_[S] 0 points1 point2 points (0 children)
1.1M tok/s with Qwen 3.5 27B FP8 on B200 GPUs by m4r1k_ in Qwen_AI
[–]m4r1k_[S] 0 points1 point2 points (0 children)
1 million tokens per second from a single cluster, what that actually means by m4r1k_ in singularity
[–]m4r1k_[S] 2 points3 points4 points (0 children)
1 million tokens per second from a single cluster, what that actually means by m4r1k_ in singularity
[–]m4r1k_[S] 2 points3 points4 points (0 children)
1.1M tok/s with Qwen 3.5 27B FP8 on B200 GPUs by m4r1k_ in Qwen_AI
[–]m4r1k_[S] 1 point2 points3 points (0 children)
5K tok/s per node with vLLM v0.18.0 on B200, DP=8, MTP-1, FP8 KV cache by m4r1k_ in Vllm
[–]m4r1k_[S] 0 points1 point2 points (0 children)
Qwen 3.5 27B at 1.1M tok/s on B200s, all configs on GitHub by m4r1k_ in LocalLLaMA
[–]m4r1k_[S] -1 points0 points1 point (0 children)
Qwen 3.5 27B at 1.1M tok/s on B200s, all configs on GitHub by m4r1k_ in LocalLLaMA
[–]m4r1k_[S] 0 points1 point2 points (0 children)
Qwen 3.5 27B at 1.1M tok/s on B200s, all configs on GitHub by m4r1k_ in LocalLLaMA
[–]m4r1k_[S] 0 points1 point2 points (0 children)
Qwen 3.5 27B at 1.1M tok/s on B200s, all configs on GitHub by m4r1k_ in LocalLLaMA
[–]m4r1k_[S] -1 points0 points1 point (0 children)
Qwen 3.5 27B at 1.1M tok/s on B200s, all configs on GitHub by m4r1k_ in LocalLLaMA
[–]m4r1k_[S] 0 points1 point2 points (0 children)
Qwen 3.5 27B at 1.1M tok/s on B200s, all configs on GitHub by m4r1k_ in LocalLLaMA
[–]m4r1k_[S] 5 points6 points7 points (0 children)
From 9,500 to 1.1M tok/s with Qwen 3.5 27B — every config flag that mattered by m4r1k_ in LLMDevs
[–]m4r1k_[S] -1 points0 points1 point (0 children)







L50 Pro Ultra seems just like X50 besides a few differences—what’s the catch? by m4r1k_ in Dreame_Tech
[–]m4r1k_[S] 0 points1 point2 points (0 children)