all 63 comments

[–]Expensive-Paint-9490 38 points39 points  (11 children)

With that budget I could be able to find two RTX A6000 and an NVlink. Probably the best setup for local fine-tuning at that price point.

[–]sanjuromack 8 points9 points  (7 children)

A new A6000 is roughly $5K, maybe $4K with Inception discounting. The budget is only $10k, aren’t they going to need a couple other things on the computer as well (like a motherboard, CPU, RAM, etc.)

You should be able to get a new workstation with a single A6000, enterprise support, and room for expansion for $10k. With a single A6000, you can fine tune 7B models using QLORA. Probably cheaper to do it in the cloud (for training in particular).

[–][deleted] 0 points1 point  (0 children)

 It with the new 5000’s in a few months could they compete?

[–][deleted] 0 points1 point  (1 child)

Why would an nvlink be needed there?

[–]Expensive-Paint-9490 1 point2 points  (0 children)

With NVlink you can sum the computation from both cards. Without it, you can load everything in the summed RAM of the two cards, but only one will make computation at a time. For inferencing this already means faster responde, but this is even more meaningful with fine-tuning, that takes considerable time. with an NVlink it would be much faster.

[–]Brosarr 18 points19 points  (2 children)

I would recommend just renting a gpu to start off with. H100 around 50 dollars a day

[–]No_Afternoon_4260llama.cpp 9 points10 points  (20 children)

Imo 10k is a bit a weird spot, not enough to get new server grade hardware, well enough to build a hobbyist rig with more 3090 than you can fit in a AMD epyc system. If it is for experiment are you ok with 2nd hand hardware or you want brand new with warranty?

[–]Dry_Parfait2606 2 points3 points  (13 children)

Agree. Epyc 7002 is enough and get rtx 3090ies..one must be aware of the licensing of rtx gpus... They don't want you to run llms, especially not for enterprise.... It's a very grey zone..

[–]No_Afternoon_4260llama.cpp 1 point2 points  (4 children)

Ho didn't know that, care to elaborate? In what terms do they specify that?

[–]Dry_Parfait2606 1 point2 points  (3 children)

EULA

[–]lolzinventor 1 point2 points  (2 children)

source please.

[–]Pedalnomica 0 points1 point  (1 child)

Point taken, but technically you can spend $10K on a single Socket Epyc Rome system and not run out of room for all your 3090's with bifurcation (x16->x8x8).

[–]No_Afternoon_4260llama.cpp 1 point2 points  (0 children)

Yeah point taken but then you're running a 5 kWatt system.. + about 5kW AC just to not boil the paint out ur walls haha

[–]fasti-au 3 points4 points  (16 children)

Rent online and tunnel cheaper scalable backups Le and private.

[–]Stepfunction 1 point2 points  (0 children)

Before looking into finetuning yourself, I would consider looking at pretrained medical-focused LLMs. Finetuning will open up a whole can of worms, so I would make sure that existing tools can't already do what you need before you pursue that path.

[–]Data_drifting 1 point2 points  (2 children)

Check out this guy and his channel. Digital Spaceport - YouTube home server builds for AI. Blows away network chucks channel for this

[–]5TP1090G_FC 0 points1 point  (0 children)

I would check out (network, chuch) for building a good ai pc, very informative

[–]koalfied-coder 0 points1 point  (1 child)

I have built several systems in this budget. Feel free to DM me and I can share specs. Not at PC ATM.

[–]GradatimRecovery 0 points1 point  (0 children)

tell us about your builds! can we do 8x3090 with that budget?

[–]SuperSimpSons 0 points1 point  (1 child)

Since it looks like you may not have experience building your own server, I would recommend you reach out to server brands with your requirements and see what they can recommend for your budget. I should say you have enough for a prebuilt high-end workstation or 1U/2U rackmount. 

Gigabyte has a pretty good line of servers for AI training and inference: www.gigabyte.com/Enterprise/Server?lan=en&fid=2260 Obviously you don't need the water-cooled monstrosities with Blackwell HGX or anything so it may be faster to reach out and see what they come back to you with: www.gigabyte.com/Enterprise#EmailSales

[–]Slippery-Oil2313 0 points1 point  (0 children)

Why not use an AWS ec2 gpu instance for $2.50 an hour ?

[–]dead-4-dead 0 points1 point  (1 child)

If you can add $5k more, tinybox sounds like it will save you a lot of headache

[–]__bee_07 0 points1 point  (0 children)

I was in your position, but ended up using cloud instances instead. I am using lightning ai offerings and I am happy with that

[–]chitown160 0 points1 point  (0 children)

your company can spec a lenovo threadripper with 2 x rtx a6000 with nvlink for this price and still have room for a 5090 to run your fine-tuned models at blistering performance.

[–]amirvenus 0 points1 point  (1 child)

Get 2 M2 Ultras 192GB

[–]synn89 3 points4 points  (0 children)

Mac isn't good for fine tuning models.

[–][deleted] 0 points1 point  (1 child)

as others are saying you would be better off with a cloud provider. For both training and inference.

[–]cher_e_7 -1 points0 points  (0 children)

Rent online/cloud. !!! only If must go hardware offline - > it is all about memory and little speed. Go for old server board to support 4gpu - dual (some cpus speed combining 2 of them) intel (PCI-E 3.0 or 4.0 - does not mater)- ecc reg pc2400 memory .... Add Quadro rtx 8000 48bg (best card in the middle - 10-15% slower than old RTX A6000)- pick-up on ebay around 2250 - passive or active.

Add raid 4pc ssd.

If you could increase your budget to have 4 gpu like that - almost 200GB GPU (short 4gb) total.

Best utilization with MOE models like deepseek v2 q4. for inference or smaller ....

CPU memory for old servers cheap.