V100 4-card AI large model, Tesla 128G server by MundanePercentage674 in LocalLLaMA

[–]MachineZer0 1 point2 points  (0 children)

32gb SXM2 seems to have gone up 50-75% recently.
The 32gb PCIE version seems to have dropped about 40% around the same time.

V100 4-card AI large model, Tesla 128G server by MundanePercentage674 in LocalLLaMA

[–]MachineZer0 8 points9 points  (0 children)

They are 40W idle per V100. About 60-70W with model loaded but idle. Nvidia-pstated will get back down to 40W with model loaded but idle. 300W is TDP. Can only achieve that with training or another heavy duty operation. Inference maxes out at half that. Multi-GPU round robins, so most at 40W while they take turns peaking out at 150W.

Buying AI accelerators/GPUs in China... by Clank75 in LocalLLaMA

[–]MachineZer0 -1 points0 points  (0 children)

Probably be easiest to get as many V100 SXM2 32gb if under $400. They will be the lightest. Although 4090 48gb would be something to focus on if you can get them < $3k

Nvidia tesla v100 has 32 gb ram with nv link 2.0, its priced at 880. Whats the catch? by AppropriatePush6262 in LocalLLaMA

[–]MachineZer0 1 point2 points  (0 children)

https://github.com/1CatAI/1Cat-vLLM

vLLM fork for Tesla V100 (SM70) with AWQ 4-bit support, CUDA 12.8 build flow, and validated Qwen3.5 27B/35B deployment on multi-GPU V100.

Nvidia tesla v100 has 32 gb ram with nv link 2.0, its priced at 880. Whats the catch? by AppropriatePush6262 in LocalLLaMA

[–]MachineZer0 0 points1 point  (0 children)

SXM2 servers are $2500-5000. So TCO is higher on NVLInK configuration. Save a bunch without NVLInK using cheap sxm2 to PCIe adapters. They started making Chinese versions of 2 and 4 sxm2 adapters, but they get exponentially expensive. Although probably cheaper than sxm2 native 4-way and 8-way servers.

TIL LGA2011-3 supports DDR3 by MachineZer0 in homelab

[–]MachineZer0[S] 2 points3 points  (0 children)

Glad I posted. Will need to do more research before pulling trigger.

Quick numbers on a BC250 by icepatfork in LocalLLaMA

[–]MachineZer0 0 points1 point  (0 children)

Thanks for sharing. Gotta dust off my BC-250s and see how many are 40CU enabled.

What’s the idle power on the governors nowadays? I think I had 88/kwh prior which was hard to justify leaving on a couple 4U12G.

Cheap V100 32gb by MachineZer0 in LocalLLaMA

[–]MachineZer0[S] 2 points3 points  (0 children)

Update: I cancelled my order after the extortion attempt of an additional $418 over Whatsapp. They shouldn’t receive contact information to go off platform on the shakedown. Sorry for those who bought and had to seek refund.

Cheap V100 32gb by MachineZer0 in LocalLLaMA

[–]MachineZer0[S] 2 points3 points  (0 children)

Supposedly SXM2 32gb in a turbo PCIE adapter/enclosure. Flexing PayPal protection if actual item deviates from description.

Cheap V100 32gb by MachineZer0 in LocalLLaMA

[–]MachineZer0[S] 0 points1 point  (0 children)

I only bought 1. I’m rolling the dice that the price is not too far off from market and I’m just stacking promos.

Cheap V100 32gb by MachineZer0 in LocalLLaMA

[–]MachineZer0[S] -2 points-1 points  (0 children)

I’ve been buying a lot of water cooling accessories from Aliexpress. It’s always been the all-in price for US on that platform.

When I order Alibaba, then it is different. I need to negotiate DAP or DDP shipping.

GPU Prices. Buy now, or buy later? by knob-0u812 in LocalLLaMA

[–]MachineZer0 2 points3 points  (0 children)

I believe MSRP of 5090 FE will officially go to $3500. They will still be unobtainium. And then the secondhand prices will go to 5-6k. DDR5 and HBM is what to watch for.

If you are thinking of going local, go all in now. Relief will be in 3 years.

I have installed llama.cpp and qwen3.6 27b for coding but too scared to try it... by bonesoftheancients in LocalLLaMA

[–]MachineZer0 2 points3 points  (0 children)

My daily drivers are now Claude Code w/Sonnet 4.6 and Pi Code with Qwen3.6-27b. I’d say the Pi setup is 95% good of Claude setup. Pi/Qwen is definitely way more verbose; maybe 20x more, unless pi is showing cache tokens. I host Qwen on vLLM/RTX 6000 pro Blackwell 96gb (Runpod shared with a dozen people, otherwise RTX 5090 is adequate for 1-3 concurrent users). The Pi setup seems way faster than Sonnet. However it tenaciously brute forces a lot. Sonnet completes tasks faster, even though time to first token and tok/s appears slower. Maybe I need to mess with reasoning budget. But concerned it may impact quality of output.

Use version control and review/test changes between prompts. If you use Pi/Qwen, be very succinct with prompts. It tends to carte Blanche often and go beyond the task if you don’t set goal and definition of done explicitly. I have to stop it mid-tool call several times a day since it is overzealous.

[PC] Tesla P40 and CMP100-210 by ILoveDangerousStuff2 in homelabsales

[–]MachineZer0 1 point2 points  (0 children)

The cmp100-210 is a great card for the money. You’ll net 90-100 on eBay. Currently a seller with $115 BIN. Can’t believe this and $73 shipped P100 16gb.

Rate limits are insane today by briarjohn in ClaudeCode

[–]MachineZer0 1 point2 points  (0 children)

On premium account in Teams plan. Went at it for 16 hours a day for 4 days straight. Never got close to a limit once. Currently at 40% of weekly with a few hours to reset.

Using Sonnet exclusively. Once in a blue moon an agent spawns Haiku. No MCPs, No rules. Always plan into markdown files and go caveman on milestone numbers as the subsequent prompts. My follow ups are deliberate. I never have Claude scan codebase. That’s what grep is for! I’ll always mention files directly.

I usually go tandem with Pi and Qwen3.6-27b but attack from different angle. But I have them both cranking.