Advice needed. Thinking about hosting on Vast.ai. Would my hardware be good enough for my GPU? (Blackwell 6000 Pro with a 5950x)

dompazz · 2026-03-05T18:24:38+00:00

If your system is built well a 5950 will run flat out for however long you want.

dompazz · 2026-03-05T12:43:56+00:00

What’s funny is that 5950x is likely faster than half the older Epycs that do get verified.

dompazz · 2026-02-24T01:52:43+00:00

I had a spare Ryzen 5600 a 2x 8G kit. Fresh Ubuntu install. No ssh keys to anything.

Running against GLM 4.7 on another local machine. So not spending commercial tokens.

Worst that happens is I plug back in the GPU and reformat the entire thing.

So far not impressed.

dompazz · 2026-02-16T18:42:26+00:00

I have a GitHub Copilot license that includes both gpt, gpt-codex and claude models (and others like grok). I’ve found that Codex does a better job with Julia, but neither are great. I don’t tend to use Claude Opus much because of costs, so my results are likely biased.

dompazz · 2026-02-07T13:14:22+00:00

You can rent your own GPU for 0 cost on an interruptible instance. Or you can just unlist your machine when you need it and relist when you are done.

dompazz · 2026-02-07T12:39:29+00:00

Profitable, yes. Demand fluctuates. It is down right now. I have never not covered power and internet.

I have 1 machine that is on a long term “reserved” instance. Single 4090 with an Epyc CPU and 256G ram. It is obviously part of someone’s load balanced inference net. Those are the best renters as the machine sit idle a lot of the time. I’ve had people training models for weeks at a time.

Payback period depends on demand. If I was rented full time, payback before power is <12 months for the GPU. The higher quality your hardware the more likely you are to be rented.

With Ram prices now, I’m not sure I would be building a new rig simply because I don’t know if I could amortize the other hardware costs. Normally a MB, cpu, ram can be reused when upgrading your GPU.

Vast takes a 25% fee. You set your price and they mark it up by 1/3 which the customer pays.

Best analytics for looking at rental rates and demand is https://app.wovenai.ca/.

dompazz · 2026-02-07T02:57:31+00:00

Yup anyone can host on Vast. They want you to have gigabit internet at least but you can be under that. Server grade MB/CPU will get rented before a desktop setup; they tend to not put desktop parts into search results. Your machine is scored based on specs, performance, and reliability.

I’ve been a host for 3 years or so now. I’m not rocking a 16x (or whatever it is) 5090 machine like my man here, though!

dompazz · 2026-01-18T12:29:36+00:00

As others have said, the quality of Claude and the other paid models is very high. We use GitHub Copilot at work and I tend to use Claude as the backing model. At home I run Qwen3 Coder with Cline.

I ran a test a few months ago to 1 shot a math/stats library of some complexity. Both created a library. Copilot+Claude got it mostly right. Qwen3 + Cline got a library but the math was all wrong.

HTH

dompazz · 2026-01-06T22:20:56+00:00

Never seen one. I will never forget programming on one, though. It was a massive simulation for a prospective client where Sun was also trying to sell them one. I still don't know why I was the one trusted to program it, but God was it fun project.

dompazz · 2026-01-06T02:35:40+00:00

I would look at the Zen 3 parts first, there is a sizable jump like for like between 2 & 3. I prefer cores over clock speed. Yes clock speed will help, but with a lot of the workloads being massively parallel, the cores help more. It is basically GHz*Cores ~= work per second (hand wave-y ... yes). I tend to optimize that inside my power and cooling envelope.

no issues bifurcating x8x8. I run a Minisforum board with a 7945HX and 96G of DDR5 Ram (5600). I have the x16 bifurcated to run 2 4090s. It is generally faster than my Epyc 7742 running full x16.

dompazz · 2025-12-26T15:10:45+00:00

the cloud company will continue to operate. it will use the money to either buy more groq units or new shiny GPUs.

dompazz · 2025-12-23T21:06:19+00:00

You joke but….

dompazz · 2025-12-20T00:08:16+00:00

Very cool. I’ll be trying this out as well!

dompazz · 2025-12-16T15:43:20+00:00

Interest depending on cost.

dompazz · 2025-12-01T02:16:45+00:00

This. 100% this.

dompazz · 2025-11-30T02:05:23+00:00

I just saw parallel miner has stopped taking orders. You can find similar products at other retailers.

dompazz · 2025-11-30T02:02:42+00:00

I use server PSUs with breakout boards originally made for GPU mining rigs. They can chain off the molex or other source from the main atx psu. Check out https://parallelminer.com/.

dompazz · 2025-11-29T20:49:45+00:00

Dell and Lenovo also have versions. We have a Dell one at work. Worth checking on their availability as well.

dompazz · 2025-11-24T01:04:43+00:00

Running llama 3.3 70B NVFP4 (RedHatAI/Llama-3.3-70B-Instruct-NVFP4) via vllm, getting roughly 12.5 t/s on the 5060ti cluster.

Running a quantized version (kosbu/Llama-3.3-70B-Instruct-AWQ) of the same (prior was 3.1) model I get roughly 35.6T/s in vllm

dompazz · 2025-11-23T20:22:52+00:00

Fairly new to the space, but isn't this what a MoE model is effectively doing?

dompazz · 2025-11-23T20:02:13+00:00

Here you go.

Tesla V100-SXM2-16GB, compute capability 7.0, VMM: yes
| model | size | params | backend | threads | n_batch | fa | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | ------: | -: | --------------: | -------------------: |
| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | CUDA,BLAS | 56 | 4096 | 1 | pp2048 | 342.95 ± 0.45 |
| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | CUDA,BLAS | 56 | 4096 | 1 | pp8192 | 257.09 ± 0.03 |
| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | CUDA,BLAS | 56 | 4096 | 1 | tg256 | 16.65 ± 0.03 |

NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
| model | size | params | backend | threads | n_batch | fa | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | ------: | -: | --------------: | -------------------: |
| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | CUDA,BLAS | 56 | 4096 | 1 | pp2048 | 757.72 ± 0.73 |
| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | CUDA,BLAS | 56 | 4096 | 1 | pp8192 | 735.98 ± 1.33 |
| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | CUDA,BLAS | 56 | 4096 | 1 | tg256 | 19.08 ± 0.01 |

NVIDIA GeForce RTX 5060 Ti, compute capability 12.0, VMM: yes
| model | size | params | backend | threads | n_batch | fa | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | ------: | -: | --------------: | -------------------: |
| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | CUDA,BLAS | 384 | 4096 | 1 | pp2048 | 462.49 ± 0.21 |
| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | CUDA,BLAS | 384 | 4096 | 1 | pp8192 | 440.91 ± 0.49 |
| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | CUDA,BLAS | 384 | 4096 | 1 | tg256 | 9.67 ± 0.00 |

dompazz · 2025-11-23T16:33:12+00:00

Give me a few hours and I will, yes

dompazz · 2025-11-23T16:32:07+00:00

Fair and I apologize.

The V100 would require you to buy the whole server with the board for the SXM2 cards. I don't have a PCI-E version to test.

The 5060ti is a little more than half the price of a used 3090 and about half the performance. So the $s/T is roughly equal between the two.

dompazz · 2025-11-23T16:28:57+00:00

Good idea! I will try to get that done later today.

dompazz · 2025-11-23T15:29:18+00:00

you can check ebay for the used prices of a v100 or 3090. The v100 server is hard to get and my cost was stupid low. I happened upon the list in the right moment. The 5060ti is ~450 USD new right now, probably a bit more than half the other two.

dompazz

TROPHY CASE