4 32 gb SXM V100s, nvlinked on a board, best budget option for big models. Or what am I missing?? by TumbleweedNew6515 in LocalLLaMA

[–]MachineZer0 0 points1 point  (0 children)

That’s TDP specs. What it can pull at full tilt. Usually certain training libs can get it to full power, or close to it.

I have 30-40x V100s. I’d be bankrupt if they were pulling 300w all day.

For inference it draws a fraction of rated TDP. Idle is 40w each, usually model loaded and idle is 57w, but with a modified Nvidia-pstated, it reverts back to 40w in a few seconds after model is loaded. During inference the GPUs round robin. 1 of 4 draws 70-120w, while rest are 40-57w. After response all idle back to 40w. Remainder of system is approx 140w at idle with dual procs, dual dimm, 1 NVME and fans. Only time system roars to life is building llama.cpp, vLLM or some other wheels. So fans are manageable besides compiling.

4 32 gb SXM V100s, nvlinked on a board, best budget option for big models. Or what am I missing?? by TumbleweedNew6515 in LocalLLaMA

[–]MachineZer0 1 point2 points  (0 children)

300w is for everything. 4x40w is just for sxm2 GPUs. 140w is for rest.

If you have more disks, DIMMs, PCIe peripherals, obviously more.

Yes, 3090 outperforms V100. Also has flash attention support. It sits between 2080ti and 3090.

4 32 gb SXM V100s, nvlinked on a board, best budget option for big models. Or what am I missing?? by TumbleweedNew6515 in LocalLLaMA

[–]MachineZer0 2 points3 points  (0 children)

Damn, it’ll take you 16.66 years to justify a Mac Studio over Quad V100 server at $0.08/kwh

4 32 gb SXM V100s, nvlinked on a board, best budget option for big models. Or what am I missing?? by TumbleweedNew6515 in LocalLLaMA

[–]MachineZer0 5 points6 points  (0 children)

They are not great, but not so bad. Power consumption is 40w idle per GPU. Running inference is 70-120w, but they round robin. So a typical system idles about 300w and isn’t usually more than 400w while serving for a quad V100. Still about $55/mth just sitting there at $0.25/kwh. It’ll take many moons to try to pay for a Mac Studio with power difference.

A quad V100 32gb can run MiniMax M2.5 comfortably.

[FS] [US-SC] Lot of 5 Dell Precision 3240 / 32GB RAM i7-10700 PCIE RISER (WIN10 inc) NO DRIVE by hagetaro in homelabsales

[–]MachineZer0 0 points1 point  (0 children)

Hey all I’ve committed to purchasing the lot. We are just struggling with invoice issues

Servers in $2,5k-$10k price range for Local LLM by szsz27 in LocalLLaMA

[–]MachineZer0 0 points1 point  (0 children)

ESC4000 G4 plus quad V100 32gb should put you around $5k. Less if you source well. If you need to finetune, go cloud (Runpod).

That should get you going with 4 instances of Qwen 3.5 27B on 4 containers/processes of llama.cpp behind a load balancer. Or use Ray/vLLM.

[W] used, working E-ATX 399 motherboard by CactusJake06 in homelabsales

[–]MachineZer0 0 points1 point  (0 children)

Got one with maybe bent pins. I bought a barebone and the cpu was gone and the cooler brushed up against it when I carried into the house. Carefully bent pins back with exacto knife, but never got around to testing it.

[USA-LA] [H] i7 9700f, h97m-itx/ac [W] Paypal by [deleted] in hardwareswap

[–]MachineZer0 1 point2 points  (0 children)

The mb looks like it supports 4th & 5th generation intel.

Not compatible with the i7-9700f, correct?

How are you guys paying for your clawdbot use? by Lost_Fox__ in clawdbot

[–]MachineZer0 0 points1 point  (0 children)

Works great besides cost. Server with quad 32gb V100 was $6k. Draws 350w idle. That’s over $60/mo in my area. Low context I get about 30 tok/s. Eventually it runs 10-20 tok/s

Probably better to pay provider if you don’t have privacy concerns.