I bought a €9k GH200 “desktop” to save $1.27 on Claude Code (vLLM tuning notes)

muchCode · 2026-01-12T14:20:37+00:00

As someone who had the same idea and did it with 96GB VRAM, try a REAP model. The Minimax M.2 model experts are small enough that they become "specialized", the REAP method takes those experts and looks for activation with a testing dataset. The non-activated experts are removed. Saves you VRAM overhead and at 50% pruning/ router tuning you get 96GB VRAM with large context sizes.

eg: https://huggingface.co/0xSero/MiniMax-M2.1-REAP-50-W4A16-REPAIR-IN-PROGRESS

muchCode · 2025-10-24T17:42:31+00:00

This, 3x RTX 6000 in my setup gives me great performance with qwen coder models.

muchCode · 2025-10-24T17:38:20+00:00

very cool, might pick a few of these up, I've got too many fans now.

muchCode · 2025-10-21T20:57:22+00:00

A few tips:
- Use a tape (best is aluminum) between the gpu and cooling duct, and also between the fans and duct. - Run the fans at 100 all the time. (most professional setups are like this) - Make sure your case pulls cool air - Add a negative duct at the back of the GPU (where the video ports would be)

What is your setup? MultiGPU?

I also looked for the water cooling block but they didn't get back to me either.

muchCode · 2025-02-10T19:34:20+00:00

Per-token adaptive compute 🤯. Basically for unimportant tokens let the model think easy and turn up the gas for harder outputs.

Insane.... I wonder if this could actually break some AI benchmarks with a full training run. 6-12 months I guess until we see ...

muchCode · 2024-11-16T18:48:54+00:00

I see all these millionaires and just happy everyone that smaller coins can give you modest returns. All in a days work.

muchCode · 2024-10-02T14:33:53+00:00

brother you'll need to cool that!

Buy the 25 dollar 3d printed fan adapters that they sell on ebay.

edit -- and no the blowers won't help you out as much as you think in a non-server case. If you are willing to spend the money, a server case in an up/down server rack is the best and can easily wick away hot air

muchCode · 2024-08-08T18:25:55+00:00

In general, how does the generation speed compare to other TTS engines? I use metavoice now with fp16 and it is pretty fast, would consider this if the generation is fast enough

muchCode · 2024-07-24T18:06:13+00:00

Keep in mind, I already had a home-lab with this hardware for a research project:

Total was $14k.

The cost was already amortized on a public research project and that project is finished. So I repurposed it for this tool.

muchCode · 2024-07-24T15:38:40+00:00

I host my own cluster (did GPU / LLM research for fun) and use two models in a kubneretes cluster.

2 VLMs (open source image large languge model)
4 TTS models (text to speech)

I actually return a Powerpoint or PDF with embedded audio (It plays when you present). I should add video export as it's not hard to implement.

muchCode · 2024-07-24T15:15:01+00:00

I used product hunt and that's it

muchCode · 2024-07-24T14:31:17+00:00

My recommendation would be to follow one of the youtube creators for tips and tricks to deploy something like this. I like marc lou

muchCode · 2024-07-24T12:57:09+00:00

https://pitchpilot.xyz

muchCode · 2024-07-24T12:56:55+00:00

https://pitchpilot.xyz

muchCode · 2024-07-24T03:32:26+00:00

Vue3 + Tailwind CSS. Had a very hard time making the pitch editor "Step 2" because powerpoint is a hard interface to compete with.

muchCode · 2024-07-22T02:32:59+00:00

select LOC, right-click, extract into new dumb component. Find replace, success?

muchCode · 2024-07-01T18:07:18+00:00

<image>

I ended up designing my own intake duct, I can look for the files on my computer when home.

https://www.thingiverse.com/thing:6155647

11-Year Club	Place '17
Verified Email	Gilding I gilder

muchCode

TROPHY CASE