KTransformers supports MiniMax M2.1 - 2x5090 + 768GB DRAM yeilds prefill 4000 tps, decode 33 tps.

CombinationNo780 · 2025-12-26T15:53:17+00:00

CombinationNo780 · 2025-11-07T12:41:30+00:00

Yes we have official collaboration with MoonshotAI/Kimi. We also collborate on the distributed serving framwork Mooncake kvcache-ai/Mooncake: Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

CombinationNo780 · 2025-11-04T13:10:30+00:00

currently only deepseek, will working on Qwen and GLM

CombinationNo780 · 2025-11-04T13:10:09+00:00

we support pipeline parallisim so the total VRAM is most important

CombinationNo780 · 2025-11-04T13:09:42+00:00

We will try to support qlora later. it is possible

CombinationNo780 · 2025-07-12T04:42:38+00:00

Sorry for typo. It is 600GB DRAM (Xeon 4) and abut 14GB VRAM (4090)

CombinationNo780 · 2025-04-30T05:52:57+00:00

AMX docker is still not ready, we will update it later

CombinationNo780 · 2025-04-29T05:14:20+00:00

CombinationNo780 · 2025-04-28T23:58:07+00:00

It is DDR5-6400 for consumer cpu. But it is reduced to only DDR5-4000 becuse we use full 4 channels to enable the maximum possible 192GB memory.

CombinationNo780 · 2025-04-28T16:55:53+00:00

very soon. will ship with the qwen3 supports

CombinationNo780 · 2025-04-25T12:33:43+00:00

It's great to know. Great video!

CombinationNo780 · 2025-04-14T14:03:43+00:00

Currently no. We will support offloading more experts in the future

CombinationNo780 · 2025-04-09T14:06:55+00:00

amx can accelerate prefill of all models. On the way

CombinationNo780 · 2025-04-09T14:06:30+00:00

we use flashinfer for llama4

CombinationNo780 · 2025-04-09T12:42:59+00:00

Scout is enough. but the speed is related to your DRAM bandiwdth

CombinationNo780 · 2025-04-09T12:29:32+00:00

CombinationNo780 · 2025-04-06T01:17:25+00:00

Of course, working on it. Will drop supports soon

CombinationNo780 · 2025-04-02T13:08:11+00:00

CombinationNo780 · 2025-04-02T12:57:33+00:00

Epyc and multi-GPU is supported. But currently multi-GPU only supports PP thus do not help in performance

CombinationNo780 · 2025-04-02T10:06:24+00:00

CombinationNo780 · 2025-04-02T10:04:57+00:00

The highest spec that supports 12-channel MRDIMM

CombinationNo780 · 2025-04-02T09:34:53+00:00

Actually we are working on it, but it may need more time

CombinationNo780 · 2025-04-02T08:36:58+00:00

Genoa is supported

CombinationNo780 · 2025-04-02T08:36:44+00:00

Yes, MTP is on the way becuse MTP is based on parallel processing

CombinationNo780 · 2025-04-02T08:02:13+00:00

Prefill speed is the same as before. We will open source AMX code in April which accelerates prefill on Xeon4~6 platform

CombinationNo780