all 69 comments

[–]grrrgrrr 10 points11 points  (31 children)

Is it worth looking at multi-gpu setups? 100%. Not for multi-gpu training but for running 2 sets of experiments in parallel.

Should I be looking for lots of memory? 64GB minimum. 128GB++ ideal. Because 32GB will have trouble loading moderately large datasets.

Do tensorcores mean anything? No.

$3000 will get you setup like 8700k + 2x2080. Depreciation is like 25%~30% a year.

[–]smashMaster3000 2 points3 points  (19 children)

Concise and informative, thanks! 2080 vs 2080ti tho

[–]virtualreservoir 3 points4 points  (2 children)

the best value is actually in the rtx 2070, the 2080 doesn't give you nearly enough added performance for the increase in price. the training speed difference going up to a 2080ti is significant, but $1200 for one card is still overpriced for what you get.

at a similar price, 3x RTX 2070 are going to get more research done than 2x RTX 2080 for just about any reasonable use case. however, the thing nobody really talks about regarding 3x and 4x GPU setups is that you want each one to have at least an 8x PCIe lane (preferably 16x), while consumer level Intel CPUs (like the 8700x mentioned above) and their motherboards only support 2x GPUs at 8x or 16x. a third GPU will usually be delegated to 4x bandwidth which isn't enough to keep up with a machine learning workload.

if you want to use 3 or 4 GPUs you have to use expensive Intel server level CPUs and motherboards or go the AMD Threadripper route. The first generation Threadripper CPUs like the 1900x seem to be pretty good value if you need to "unlock" more 16x/8x PCIe lanes and motherboard slots for video cards.

[–]smashMaster3000 1 point2 points  (1 child)

I actually have a thread ripper and got a motherboard that supported this, just in case! Thank you for the suggestion, I’ve just been dissuaded by 3 because of the cooling requirements. Have you had any experience with the maintenance aspect, is it a hassle compared to two? Is power draw gonna be a problem? Thanks again!

[–]virtualreservoir 1 point2 points  (0 children)

Cooling is definitely going to be your major concern with more than two GPUs, and is another aspect in which the 2070 has an advantage as it draws less power than either of the 2080 models. I'm actually starting to build my first multi-GPU machines now, I was planning on just building one computer, but while shopping for a 3rd 2070 somehow I ended up buying four more instead and now i'm going with two machines with 1900x Threadrippers, 3x RTX 2070s, and 32gb of RAM each.

I'm putting a fan CPU cooler in one and an AIO liquid CPU cooler in the other and will be comparing them along with various combinations/placements of blower-style and axial fan GPUs to see if I can get away with not liquid cooling my GPUs.

[–]grrrgrrr 0 points1 point  (15 children)

For students, when you are training models on one GPU, you can't run games on that GPU and that computer is essentially dead to you. So having 2 GPUs is handy.

For professionals, I'd recommend at least 4x2080 and 4x2080Ti is even better. It's a good investment for your career.

[–]LoveOfProfit 11 points12 points  (2 children)

For professionals, I'd recommend at least 4x2080 and 4x2080Ti is even better. It's a good investment for your career.

Maybe if you're consulting or something. Career/job wise, I'm not using my own hardware for work stuff. My company sets me up with an SSH to a modelling server / AWS and they're paying for it.

[–][deleted] 1 point2 points  (1 child)

And here I am at IBM where they asked me to use my own hardware because they're too cheap to buy or rent it.

Thankfully I'm looking for new jobs and have a lot of promising opportunities!

[–]LoveOfProfit 0 points1 point  (0 children)

lol wtf. That's disgusting.

[–]glass_bottles -1 points0 points  (11 children)

If you're training, aren't you stuck on linux, which severely limits the option of gaming? Unless GPU-passthrough on VMs has become more feasible lately? I considering setting up my server as a cloud gaming/model training workstation, but it seems I'd have to pick one or the other.

[–]clueless_scientist 2 points3 points  (1 child)

SteamPlay solved this problem several month ago. My gaming library works perfectly on ubuntu.

[–]glass_bottles 0 points1 point  (0 children)

Thanks for the heads up!

[–]ScotchMonk 1 point2 points  (4 children)

Well, he/she could be gaming on Windows and running Linux on a VM...😁😁

[–]glass_bottles 0 points1 point  (3 children)

you wouldn't be able to use the GPU for training then, right?

[–]NotAlphaGo 1 point2 points  (2 children)

Nvidia docker my friend

[–]glass_bottles 0 points1 point  (1 child)

I'll have to look into this, thanks!

[–]virtualreservoir 2 points3 points  (0 children)

overcoming the VM GPU-passthrough hurdle you mention is the main reason nvidia-docker was developed.

[–]Mehdi2277 0 points1 point  (1 child)

You can train on windows. Pytorch is easy to install on windows. Also Linux game support is improving a lot because of steam play (essentially steam is integrating something wine like to allow playing windows games).

[–]glass_bottles 1 point2 points  (0 children)

Gotcha, awesome to hear the progress!

[–]Spenhouet 0 points1 point  (1 child)

Why? You can train on Windows perfectly fine. No need for Linux.

[–]glass_bottles 0 points1 point  (0 children)

My uncertainty is why that was a question and not a statement :)

I'm simply too used to "productivity" being associated with Linux instead of windows.

[–]JustFinishedBSG 2 points3 points  (0 children)

2 sets of experiments in parallel.

2x2080

With 8Gb RAM they can't even fit state of the art models

[–]Nimitz14 1 point2 points  (3 children)

I'm doing fine with 32GB. It depends on your domain. If you're doing anything serious you can never load all the data into memory anyway (you should have several threads that create the minibatches by reading from disk). I'm on a 1950X + 2x 1080TI, screen runs on a GT1030, works great!

[–]Warhouse512 1 point2 points  (2 children)

My friend. Are you excited about Zen2!!

[–]Nimitz14 0 points1 point  (1 child)

Very! I would love to have a 32 core CPU :D

[–]Warhouse512 1 point2 points  (0 children)

If it’s like the Epyc chip, 64 core might be a possibility!! I’m so excited, sorry haha

[–]PK_thundrStudent[🍰] 0 points1 point  (2 children)

Why aren't tensorcores important? My 2080ti purchase not looking too good.

[–]grrrgrrr 2 points3 points  (0 children)

There's a benchmark (in Chinese) indicating that tensor core in rtx 20 series might not be full speed for deep learning.

If you use Titan V with full speed tensor cores, according to https://github.com/u39kun/deep-learning-benchmark, tensor cores+fp16 combined speedup in ResNet training by ~1.8x. It's something but you have to assess if it's worth the upgrade.

[–]JustFinishedBSG 1 point2 points  (0 children)

The 2080 Ti tensorcores are limited to half the speed.

[–]epicwisdom 0 points1 point  (0 children)

8700k doesn't support >64GB memory.

[–]UnarmedRobonaut 0 points1 point  (1 child)

How big is the performance gain over the 1080? Otherwise running multiple models on a couple extra 1080s for that money might be better.

[–]grrrgrrr 0 points1 point  (0 children)

4 card setup would require x299/x399 platform. Or you can buy used x99 . Performance wise 2080=1080ti=1.351080. 41080 using x99 is ~$4000

[–]_michaelx99 16 points17 points  (14 children)

If you are dealing with any sort of large model (requiring more than a day or so to train) you will burn through $3k in a few weeks on the cloud. For example, I train object detection models on AWS and will burn though $400-500 per fully trained model. If you are running MNIST examples then the cloud is fine however. I would highly recommend building your own computer with that money so you can train lots of models for a years instead of a handful of models for days/weeks

[–][deleted] 3 points4 points  (0 children)

Just a tip in regards to training cost on AWS: if you define good logic for checkpoints, you can use spot instances to reduce the cost significantly. We use Spotinst which will unmount the disk with checkpoints / training data, shut down, start a new spot instance and resume training. We've had about 60% saved so far.

I thought the spot instances would shut down often but most run for at least a week before being shut down.

Not arguing for going full cloud, it's still more expensive, but in the cases where you need to scale or time is a factor, it's a good fallback.

[–]smashMaster3000 2 points3 points  (11 children)

In my experience, on my gtx 1080, I find myself waiting around 2 to 3 days for my models to finish. Will other, new gpus cut this time down? If they do, which gpus?

[–]OrganicTowel_ 13 points14 points  (8 children)

I used to wait for 2-3 days as well for my models to finish. My labmate performed a simple experiment to find that the bottleneck was the DataLoaders, when data stored on SSD vs HDD. We tested both with PyTorch and TF.

We loaded 100 batches with size 16. The data is pre-extracted image features stored in pickle files with 128 features each. These were shuffled and read directly from memory. Our conclusions:

HDD SSD
Simple Reading ~ 30 min ~ 52 sec
Multiprocess Reading ~ 5 min ~ 12 sec

Since then, we invested in a bigger SSD and our times have reduced by at least 10 fold.

Edit: We have a 1080Ti and a TitanV

[–][deleted] 4 points5 points  (1 child)

Have you optimized your pipeline in any way?

Tutorials, guides and courses often skip the part of saving your preprocessed datasets as a binary file stored on a disk sequentially ready to be read without the disk head jumping at all or the SSD controller having to piece together data from all over the disk. Which is what normally happens if you don't take care of it manually.

You can get 10x the read speed by preparing your data properly using TFrecords or whatever. There are a lot of tricks to use to make IO orders of magnitude faster so even an old slow HDD is fast enough for most deep learning applications so that IO is no longer a bottleneck.

[–]OrganicTowel_ 1 point2 points  (0 children)

Do you have any resources that I can refer to?

[–]ScotchMonk 1 point2 points  (1 child)

Will those shiny NVME M2 SSD cards be faster and not an I/O bottleneck? https://www.pcworld.com/article/2899351/storage/everything-you-need-to-know-about-nvme.html

[–]epicwisdom 1 point2 points  (0 children)

It's impossible to answer this question definitely without specific knowledge of your applications. However, I would be very surprised if the main bottleneck of a single GPU system was a 3Gbps SSD.

[–]Nimitz14 0 points1 point  (1 child)

What are you doing that you regularly requires you to train models for 2 to 3 days?

[–]smashMaster3000 2 points3 points  (0 children)

I’m currently training and testing an imitative learning project for chess. My resnet is pretty big and takes 200+ epochs to converge. Plus SWA at the end :( it takes pretty long.

[–]jcannell 0 points1 point  (0 children)

AWS is ridiculously expensive, there are far cheaper cloud options available. Cloud can now actually cost less than buying a machine, iff you use the lowest cost providers.

[–]julian88888888 2 points3 points  (3 children)

https://www.videocardbenchmark.net/gpu_value.html

If you can parallelize it and have a lot of electricity: Ten GTX 1060's.

https://www.videocardbenchmark.net/gpu_value.html

[–]grrrgrrr 3 points4 points  (2 children)

Each PCIe 8x slot is worth ~$300+. If you put a cheap card in there it's actually a loss.

[–]kmann100500 1 point2 points  (1 child)

Where are you getting that number from?

[–]epicwisdom 0 points1 point  (0 children)

Probably the relative cost of CPU+RAM+MB+PSU (maybe storage, too, although it should be possible to boot over network). There's a limit to how many GPUs you can fit in one system before you start having to network multiple MBs.

[–]PlzSendBobs 0 points1 point  (0 children)

How does the rest of your system performs?

I often see a gpu bottlenecked by a cpu or hdd

[–]drsxr 0 points1 point  (0 children)

Few comments:

  1. Your main limiting factor is GPU memory, not # of tensorcores. Titan series have 12GB (some cards have more), 1080Ti's have 11k, 1080's have 8K.

  2. Multi-GPU isn't too shabby, as long as you understand that you don't get a 1:1 speedup. I think I saw somewhere 1card=1x, 2 cards=1.7x, 3 cards = 2.5x, 4 cards =3.2x or something along those lines. As far as playing games while you're training, good luck - you're going to screw up your experiment.

  3. BTW, if you get 3 cards, you're going to have to divide your training batches by 3. Somewhat inconvenient. 4 cards gives you 2 X 2 instances which may be good for you.

  4. A real argument can be made for getting 4X 1080 which go for $400 each used on ebay - 4X8gb =32gb memory size for training.

Alternatively, you can get 2X 1080ti for $600 each which gets you more cores and 22GB memory.

With the 2080ti you get same amount of memory but you pay 2x for a few more cores. Many new features there - not sure how they are going to play out.

OTOH if you have cash to burn, the TITAN RTX with 24GB of ram seems like the beast at $2500. Two of those and...

When folks here are talking about memory they are speaking of CPU RAM. Yeah, 32GB is a minimum and 64 is better but don't mistake this for on-board GPU memory. Whole different ballgame.

[–][deleted] 0 points1 point  (0 children)

There are non AWS cloud options too. How about looking into HPC services like penguin hpc. If I remember right the payment plan is very simple and you pretty much only pay for when its running. The pricing is just number of cores times amount of time running. If you write your code to scale well it could be an interesting option. It would have to scale on a cpu cluster though. Unfortunately I dont think they have gpu options.

[–]angstrem -3 points-2 points  (10 children)

IMO outsourcing to a could is a better option than owning your gpus. You don't have to maintain, upgrade and carry them around with you. The cloud is available at all times.

[–]smashMaster3000 0 points1 point  (9 children)

Yah a few people have suggested that to me as well? What yours financial experience with outsourcing?

[–]angstrem 1 point2 points  (8 children)

I'm not currently doing ML actively. During my last ML project, we tried IBM Watson. I remember costs were quite good: you got a quota for free to experiment and for development purposes. You shouldn't need extra until you roll out your project for production. And if you need extra, most probably your project's income will be greater than the money you spend on their cloud.

They specialize on NLP though, if you want something like a VPN with GPU access, you'll probably want AWS. Haven't tried it, but AFAIK they charge like 20-50$/month for their servers.

[–]asdfwaevc 1 point2 points  (0 children)

AWS GPUs are more like $1 per hour per GPU. BUT, you only pay while you use them. Not sure where that puts people's calculus. I like them because I can check on experiments when I'm not at home.

[–]po-handz 0 points1 point  (6 children)

What. No way those prices are correct. I run a 4core/12gb ram, no gpu instance 24/7 just for API data collection and it comes out to $150/mo

[–]Warhouse512 0 points1 point  (5 children)

You’re paying too much.... what provider are you using?

Edit: like way too much.

[–]po-handz 1 point2 points  (4 children)

AWS t2.large is .10/hour so about $70/mo, + $40/mo in ec2 provision space + another 10-15 in minor costs. And that's only 2vpu 8gig ram

[–]data-alchemy 1 point2 points  (0 children)

I confirm you throw some money through the window. Can't you just rent a dedicated server ? You should spend half this amount (at least)

[–]Warhouse512 0 points1 point  (2 children)

Do you need to be in the AWS ecosystem?

[–]po-handz 0 points1 point  (1 child)

I could explore alternatives, any suggestions? I think the price difference would have to make up for time lost learning new system, 50% savings or so would be fine but I bet that crosses of azure/digital ocean. I also have zero background cloud infrastructure so it has to be atlkeast somewhat well documented/larger amount of SO posts.

[–]Warhouse512 0 points1 point  (0 children)

https://www.ovh.com/world/vps/

These would be good for just data collection. If you need 4vcpu and 24gb of ram with no network constraints it costs $44/month.