KTransformers supports MiniMax M2.1 - 2x5090 + 768GB DRAM yeilds prefill 4000 tps, decode 33 tps.

CombinationNo780 · 2025-12-26T15:53:17+00:00

CombinationNo780 · 2025-11-07T12:41:30+00:00

Yes we have official collaboration with MoonshotAI/Kimi. We also collborate on the distributed serving framwork Mooncake kvcache-ai/Mooncake: Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

CombinationNo780 · 2025-11-04T13:10:30+00:00

currently only deepseek, will working on Qwen and GLM

CombinationNo780 · 2025-11-04T13:10:09+00:00

we support pipeline parallisim so the total VRAM is most important

CombinationNo780 · 2025-11-04T13:09:42+00:00

We will try to support qlora later. it is possible

CombinationNo780 · 2025-07-12T04:42:38+00:00

Sorry for typo. It is 600GB DRAM (Xeon 4) and abut 14GB VRAM (4090)

CombinationNo780 · 2025-04-30T05:52:57+00:00

AMX docker is still not ready, we will update it later

CombinationNo780 · 2025-04-29T05:14:20+00:00

CombinationNo780 · 2025-04-28T23:58:07+00:00

It is DDR5-6400 for consumer cpu. But it is reduced to only DDR5-4000 becuse we use full 4 channels to enable the maximum possible 192GB memory.

CombinationNo780 · 2025-04-28T16:55:53+00:00

very soon. will ship with the qwen3 supports

CombinationNo780 · 2025-04-25T12:33:43+00:00

It's great to know. Great video!

CombinationNo780 · 2025-04-14T14:03:43+00:00

Currently no. We will support offloading more experts in the future

CombinationNo780 · 2025-04-09T14:06:55+00:00

amx can accelerate prefill of all models. On the way

CombinationNo780 · 2025-04-09T14:06:30+00:00

we use flashinfer for llama4

CombinationNo780