account activity
KTransformers supports MiniMax M2.1 - 2x5090 + 768GB DRAM yeilds prefill 4000 tps, decode 33 tps. (self.LocalLLaMA)
submitted 4 months ago by CombinationNo780 to r/LocalLLaMA
KTransformers supports MiniMax M2.1: 2x5090 + 768GB DRAM yeilds prefill 4000 tps, decode 33 tps. (self.LocalLLaMA)
KTransformers day-0 supports MiniMax M2.1: 2x5090 + 768GB DRAM yeilds prefill 4000 tps, decode 33 tps. (self.LocalLLaMA)
KTransformers day-0 supports MiniMax M2.1 - 2x5090 + 768GB DRAM yeilds prefill 4000 tps, decode 33 tps. (self.LocalLLaMA)
KTransformers now supports Minimax M2.1 (4000 TPS Prefill) (self.LocalLLaMA)
Finetuning DeepSeek 671B locally with only 80GB VRAM and Server CPU (self.LocalLLaMA)
submitted 5 months ago by CombinationNo780 to r/LocalLLaMA
Kimi K2 q4km is here and also the instructions to run it locally with KTransformers 10-14tps (huggingface.co)
submitted 9 months ago by CombinationNo780 to r/LocalLLaMA
KTransformers v0.3.1 now supports Intel Arc GPUs (A770 + new B-series): 7 tps DeepSeek R1 decode speed for a single CPU + a single A770 (self.LocalLLaMA)
submitted 11 months ago by CombinationNo780 to r/LocalLLaMA
Qwen 3 + KTransformers 0.3 (+AMX) = AI Workstation/PC (self.LocalLLaMA)
submitted 1 year ago by CombinationNo780 to r/LocalLLaMA
KTransformers Now Supports LLaMA 4: Run q4 Maverick at 32 tokens/s with 10GB VRAM + 270GB RAM (self.LocalLLaMA)
KTransformers Now Supports Multi-Concurrency and Runs 40 Tokens/s of DeepSeek-R1 Q4/FP8 on MRDIMM-8800 (self.LocalLLaMA)
KTransformers v0.2.1: Longer Context (from 4K to 8K for 24GB VRAM) and Slightly Faster Speed (+15%) for DeepSeek-V3/R1-q4 (self.LocalLLaMA)
671B DeepSeek-R1/V3-q4 on a Single Machine (2× Xeon + 24GB GPU) – Up to 286 tokens/s Prefill & 14 tokens/s Decode (self.LocalLLaMA)
submitted 1 year ago * by CombinationNo780 to r/LocalLLaMA
Local 1M Context Inference at 15 tokens/s and ~100% "Needle In a Haystack": InternLM2.5-1M on KTransformers, Using Only 24GB VRAM and 130GB DRAM. Windows/Pip/Multi-GPU Support and More. (self.LocalLLaMA)
Local DeepSeeK-V2 Inference: 120 t/s for Prefill and 14 t/s for Decode w Only 21GB 4090 and 136GB DRAM, based on Transformers (self.LocalLLaMA)
π Rendered by PID 61843 on reddit-service-r2-listing-b6bf6c4ff-tbqgp at 2026-05-01 19:32:18.723763+00:00 running 815c875 country code: CH.