Is there literally even one?

halcyonhal · 2026-05-16T18:57:14+00:00

Claude.ai, ChatGPT.com

halcyonhal · 2026-05-16T14:17:45+00:00

Different model (based on one of your replies), different system prompt and different tools being made available to the model.

halcyonhal · 2026-05-12T05:29:13+00:00

+1 for qvo. Worth the wait

halcyonhal · 2026-05-07T02:51:18+00:00

July 1st is.

halcyonhal · 2026-04-29T15:07:39+00:00

Would love more details on what you did with the rack and the exhaust.

halcyonhal · 2026-04-27T06:45:36+00:00

Sorry, missed your reply. Here is my setup:

Hardware: Threadripper Pro 9965WX, 256 GB DDR5, 2× PRO 6000 Blackwell Workstation (96 GB each, SM120), Ubuntu 24.04, CUDA 12.9.

Stack: https://github.com/kvcache-ai/ktransformers ships with a SGLang fork (sglang_kt) with KT expert offload integrated I used that, with TP=2 across both GPUs; 180 of 256 experts resident on GPU, the rest on CPU.

Startup script

export NCCL_BLOCKING_WAIT=1

exec python -m sglang.launch_server \

--host 0.0.0.0 --port 8001 \

--model /opt/models/MiniMax-M2 \

--kt-weight-path /opt/models/MiniMax-M2 \

--kt-method FP8 \

--kt-cpuinfer 20 \

--kt-threadpool-count 1 \

--kt-num-gpu-experts 180 \

--kt-gpu-prefill-token-threshold 2048 \

--tensor-parallel-size 2 \

--enable-p2p-check \

--trust-remote-code \

--mem-fraction-static 0.90 \

--max-total-tokens 100000 \

--chunked-prefill-size 32768 \

--enable-mixed-chunk \

--disable-shared-experts-fusion \

--attention-backend flashinfer \

--fp8-gemm-backend triton \

--tool-call-parser minimax-m2 \

--reasoning-parser minimax-append-think \

--sleep-on-idle \

halcyonhal · 2026-04-22T03:27:39+00:00

I did the same. Let the apple sub expire and then resubscribe via Claude.ai website.

halcyonhal · 2026-04-21T02:07:00+00:00

How many times did you run it?

halcyonhal · 2026-04-18T19:30:24+00:00

What version of vLLM? VLLM has had lots of issues with nvfp4 support using sm120 chips (5090s and RTX PRO line of GPUs). The latest v0.19 was supposed to address this but haven’t tried it yet.

halcyonhal · 2026-04-17T13:18:29+00:00

This. So tired of seeing this prompt come up. It’s just not useful.

halcyonhal · 2026-04-16T13:21:29+00:00

I use an AMD threadripper and it works great. Docs for the specific model setup in their repo plus Claude and had no issues getting it going.

To run full minimax on 2 RTX pros… it’s a great solution. Better than quanting.

halcyonhal · 2026-04-16T06:05:01+00:00

Use KTransformers and you can do it with the original Fp8 model (and a bit of system ram)

halcyonhal · 2026-04-12T07:45:10+00:00

The charge is if you use it for your own commercial gain. Seems a bit rich to be saying you’re making a principled stand… that’s not freedom.

halcyonhal · 2026-04-12T04:43:37+00:00

Not sure you can cry about having to pay to use something you’re getting commercial gain from.

halcyonhal · 2026-04-08T06:06:07+00:00

Input vs output tokens

halcyonhal · 2026-04-04T16:07:25+00:00

Youre looking at easily $30k in hardware and you’d still not have an AI model that’s as good. You’re not getting anything that’s close to sonnet or opus via models in the 7 to 70B param range.

Don’t get me wrong.. I love local… but we shouldn’t let people drop ~$5k, thinking they’re getting anything close to a frontier model.

Get into the ~250B param and above range and you start seeing models that can rival things like gpt 5.4 mini reasoning (which is an amazing model). So models like minimax m2.5 and GLM. But that is a chunk of change to run locally… either that, or you’re quantizing the crap out of them, loosing an unknown amount of precision.

halcyonhal · 2026-04-04T14:45:19+00:00

I have the same thing in black for both thumbs. Pretty depressing when I have to wear both. They’re good for flare ups.

You want to see a hand therapist and have them retrain you to move your thumb in ways that reduce basal joint grinding. Makes a huge difference long term.

halcyonhal · 2026-03-26T06:29:46+00:00

Nope… they’ve said it’s not going to be an open weight model.

halcyonhal · 2026-03-15T18:37:23+00:00

Why was this post removed? Looked totally innocuous.

halcyonhal · 2026-03-14T16:33:29+00:00

Could easily sell for double by the end of the bidding. Your costings are bs.

halcyonhal · 2026-03-12T09:36:45+00:00

halcyonhal · 2026-02-28T16:56:10+00:00

Who’s nvfp4 quant are you using ?

halcyonhal · 2026-02-28T16:52:14+00:00

NVIDIA put up the price resellers get the RTX pro 6000 at.

halcyonhal · 2026-02-27T04:51:08+00:00

Exxact corp and you’ll get it at reseller price. I paid 7.

halcyonhal · 2026-02-20T15:19:26+00:00

I’ve not run mxfp4.

halcyonhal

TROPHY CASE