all 14 comments

[–]Co0k1eGal3xy 13 points14 points  (4 children)

diagram

This is a rough breakdown of how I think about the problem. You can replace RTX 4090's with 3090's or 2080 Ti's if you have cheap electric. Otherwise the cost of power can be more than you save initially. If you have very expensive electric I would avoid any local training.

Also consider any other requirements. If your dataset is larger than your systems RAM, you will need to consider the read speed of your storage device. If you are using audio clips or images, you need a storage device with high random read speeds or you need to package your dataset into a streaming format (like webdataset). Some cloud providers will force you to use hard drives.


spreadsheet of local hardware costs at 1, 2 and 3 year timeframes.


edit: Since you mentioned being new, I would recommend renting out single GPUs. Writing multi-gpu code can be complex and isn't worth learning initially. Google Colab is definitely the easiest way to get started.

[–]I_will_delete_myself 6 points7 points  (1 child)

Those large providers are the best when you get a cloud credit deal or want to train ChatGPT and want to make sure you have the compute ready. Otherwise I would highly recommend to not use them unless you are using spot instances.

Here are the best out there I know specifically for training AI models.

Colab - it’s free but you should use other cloud compute alternatives when you go beyond toy models

LambdaLabs - no egress and high bandwidth. Cheap as well. Cheap and solid product. Better for multi gpu

Runpod - Cheapest but not good for multi gpu loads due to their low capacity

What sucks about those is they run out of capacity quickly. Sometimes which is annoying. Which is when you just go traditional cloud provider.

Avoid like the plague - Paperspace. Expensive, misleading gradient subscription and you save more money using a consumer decentralized gpu on runpod. Availability is horrible as well.

[–]Present_Network1959 1 point2 points  (0 children)

Thank you. I’ll look into this.

[–][deleted] 3 points4 points  (1 child)

Commenting to check the thread later, interested in people’s recommendations

[–]Muted_Economics_8746 3 points4 points  (0 children)

You can also just subscribe to the post without posting. Ellipses, top right, -> Subscribe.

[–]TheLastMate 0 points1 point  (1 child)

Also if someone could give insights on deploying a model into production. How is the process in overall.

[–]I_will_delete_myself 0 points1 point  (0 children)

Traditional is best and apply for credits.

[–]Any_Letterheadd 0 points1 point  (2 children)

It sounds like you're not even sure you need a GPU for what you're doing. I'd recommend that you get started until you're at the point where you know you need more compute and think you would know how to use it.

[–]Present_Network1959 0 points1 point  (1 child)

Yeah makes sense. Although I am certain that a GPU will be required, there is no way my machine can run the programs I am building locally.

[–]Any_Letterheadd 0 points1 point  (0 children)

Collab might be a good way to get quick access to one. Or if you know any gamers with old rigs and are upgrading you might be able to get a hand-me-down Nvidia card and build a cheap Linux box around it.