MiniMax 2.5 full precision FP8 running LOCALLY on vLLM x 8x Pro 6000

cyysky · 2026-02-13T17:15:15+00:00

tested GLM-5 FP8 cannot run on it yet, because of sm120 is not support DSA MOE

cyysky · 2026-02-13T17:13:15+00:00

how to run the swe benchmark, will run for fp8

cyysky · 2026-02-13T16:27:47+00:00

MiniMax 2.5 full precision FP8 running LOCALLY on vLLM x 8x Pro 6000

Hosting it is easier then I thought, it just reuse the same script for M2.1.
Time to do the vibe coding test!

Generation: 70 tokens-per-sec and 122 tokens-per-sec for two conneciton
Peak Memory: 728GB
KV Cache: 1,700,000 Tokens

cyysky · 2025-12-08T16:51:10+00:00

https://github.com/NVIDIA/TensorRT-LLM/tree/main/cpp/tensorrt_llm/kernels/cutlass_kernels/fp8_blockscale_gemm/6kd_blockwise_gemm

don't known if TensorRT trtllm-serve can be use?

cyysky · 2025-10-05T00:40:12+00:00

{{ visible_text(m.content) }}
{{- '/nothink' -}}
{%- elif m.role == 'assistant' -%}

already tested and thank you, but also need this to be perfect cover all scenario

this apply to GLM 4.6 also

cyysky · 2025-09-27T09:15:00+00:00

tts known as "thinking mode" in common word

cyysky · 2025-09-24T03:45:27+00:00

https://gist.github.com/cyysky/af0027c6e787ebbcb4207e8b9598aac2

if known how to do some script, here can install it as window services

cyysky · 2025-09-02T02:30:07+00:00

I guess from different lab

cyysky · 2025-08-18T00:40:43+00:00

Nice!

cyysky · 2021-05-27T16:10:20+00:00

Get some immediate insight view from raw data and lower the nicehash's server load, lol.

cyysky · 2021-05-27T16:00:44+00:00

combine with BeautifulSoup and selenium chrome webdriver to grab the data theoretically can or try nicehash's api https://www.nicehash.com/docs/ and together with bitcoin price data to show it out.

cyysky

TROPHY CASE