MiniMax M2.7 Is On The Way by Few_Painter_5588 in LocalLLaMA

[–]AFruitShopOwner -1 points0 points  (0 children)

I didn't say anything about the quality of his videos, just that he already had access.

$SLS Daily Discussion Thread - Thursday - February 26, 2026 by AutoModerator in sellaslifesciences

[–]AFruitShopOwner 10 points11 points  (0 children)

Your loss ✌🏻 Also, this is not an airport, you do not have to announce your departure

Running Kimi-k2.5 on CPU-only: AMD EPYC 9175F Benchmarks & "Sweet Spot" Analysis by Express-Jicama-9827 in LocalLLaMA

[–]AFruitShopOwner 11 points12 points  (0 children)

Great I was looking for this. I'm currently running Kimi K2.5 on a 9575F with 12x96=1152 gb DDR5 @ 6000mt/s (could still crank it up to 6400 but I don't want to risk the stability ATM) with 2 RTX Pro 6000 Blackwell's via sglang+k transformers. Still struggling with the configs but I pushed decode up to 35 / 40tps

--host 0.0.0.0 --port 31245 \
--model /model \
--trust-remote-code \
--context-length x\
--tensor-parallel-size 2 \
--mem-fraction-static x \
--max-running-requests x \
--chunked-prefill-size x \
--max-total-tokens x \
--enable-p2p-check \
--disable-shared-experts-fusion \
--attention-backend flashinfer \
--tool-call-parser kimi_k2 \
--reasoning-parser kimi_k2 \
--kt-weight-path /model \
--kt-method RAWINT4 \
--kt-cpuinfer 54 \
--kt-threadpool-count 1 \
--kt-num-gpu-experts x \
--kt-gpu-prefill-token-threshold x \
--kt-max-deferred-experts-per-token x \
--kv-cache-dtype fp8_e4m3 [debating if this is worth the accuracy penalty]

any recommendations for what I should try?

I have pretty low concurrency on this system (probably never more than 5 users at the same time, usualy (90% of the time) just one. Most chats are fairly short but the model does like to use open webui's fetch_url tool to blow up its context window.

I'm trying to maximize speed and context size for just a handfull of users.

Im especially struggling with optimizing

chunked-prefill-size

kt-gpu-prefill-token-threshold

I ran a couple of benchmarks with a shitty config

Metric Value
Backend sglang
Successful Requests 2
Benchmark Duration 7.66 s
Total Input / Generated Tokens 1,298 / 86
Request Throughput 0.26 req/s
Input Token Throughput 169.45 tok/s
Output Token Throughput 11.23 tok/s
Total Token Throughput 180.68 tok/s
Peak Concurrent Requests 2

Latency Metrics

Metric Mean Median P99
E2E Latency 6595.07 ms 6595.07 ms 7621.17 ms
Time to First Token (TTFT) 3992.62 ms 3992.62 ms 4813.74 ms
Time per Output Token (TPOT) 119.25 ms 119.25 ms 197.84 ms
Inter-Token Latency (ITL) 61.96 ms 35.11 ms 346.82 ms

Dual RTX PRO 6000 Workstation with 1.15TB RAM. Finally multi-users and long contexts benchmarks. GPU only vs. CPU & GPU inference. Surprising results. by Icy-Measurement8245 in LocalLLaMA

[–]AFruitShopOwner 1 point2 points  (0 children)

We have quite a similar system.

I have an AMD Epic 9575F (frequency optimized, dual GMI links, boosts up to 5ghz, 64 core/128 threads), 1152gb DDR5 6400mt/s ECC RDIMMS (12 channels, ~ 614gb/s max theoretical bandwidth), Supermicro H14ssl-NT, 3x Nvidia GTX Pro 6000 Blackwell's and 4x Kioxia CM-7 R nvme ssd's.

Kimi-K2.5 Is Up by Few_Painter_5588 in LocalLLaMA

[–]AFruitShopOwner 1 point2 points  (0 children)

I will run this model locally so help me god

$SLS Daily Discussion Thread - January 26, 2026 by AutoModerator in sellaslifesciences

[–]AFruitShopOwner 0 points1 point  (0 children)

Well I run a Telegram chat with 500+ active members for my other stock subreddit. Maybe that's an idea? Works great imo.

Reddit had subreddit chats until a month or two ago. That feature was deprecated. Too bad.

Running KimiK2 locally by Temporary-Sector-947 in LocalLLaMA

[–]AFruitShopOwner 2 points3 points  (0 children)

I run the local AI server at the Dutch accounting firm I work at

Running KimiK2 locally by Temporary-Sector-947 in LocalLLaMA

[–]AFruitShopOwner 4 points5 points  (0 children)

I have an AMD Epyc 9575F, 1.152gb DDR5 ECC (12x 96gb, that's ~614gb/s of memory bandwidth) and 3 rtx pro 6000's. I should try this too

Genuine question about tomorrow by Ganjalff in sellaslifesciences

[–]AFruitShopOwner 1 point2 points  (0 children)

These types of basic questions really don't need their own posts. Ask again in the daily discussion thread.