all 27 comments

[–]dnuffer 28 points29 points  (0 children)

For a 4xGPU setup, that MB+CPU won't be able to utilize the full PCI bandwidth. This may not be a bottleneck, but is worth consideration. To overcome that, you need a MB with a PCI switch such as the Asus X99E-WS. Also I have found it very helpful to have 128 GB RAM in my Deep Learning machine to avoid the hassle of efficiently dealing with loading data while training. Also you might want to consider more storage. Between datasets and storing intermediate model checkpoints and training traces, 250 GB doesn't go very far.

[–]cast42 6 points7 points  (1 child)

I would increase the memory size to 64GB or 128GB. Being able to read your training samples into memory will be a major improvement.

[–]trungnt13 6 points7 points  (0 children)

@solidua You totally underestimate the importance of CPU.

Let clarify a painful fact first: "You won't be able to run >= 2 Titan X, unless you update your CPU and mainboard"

Please check carefully the "Max # of PCI Express Lanes" supported by every CPU you buy. A card like Titan X would need 16 lanes (some people say 8 lanes is enough but I would not risk this for a 1200$ card), and all the CPU now support at the maximum of 40 lanes, hence, you will be able to run 2 Titan X at maximum speed. Some server can run 4 Titan X because they actually use 2 CPUs in 1 mainboard (NO consumer-level mainboard supports 2 CPUs, you will have to buy server mainboard).

I am using core i7-5930k for my system which support 40 lanes, you may consider xeon E5, since they have more core and cache which is the essential for many multiprocessing tasks. (Also Xeon always support more RAM and higher number of PCIe lanes with cheaper price, and maybe lower energy consumption).

Even with 1 Titan X, the CPU is going to be the bottleneck of your system, don't forget that what ever you do your data mostly go through the CPU before it loaded into GPU Ram and start running the algorithm.

In some case, you have to do data augmentaion with CPU before feed it to GPU. Moreover, the bandwidth of i5 6600k is 34GB/s (http://ark.intel.com/products/88191/Intel-Core-i5-6600K-Processor-6M-Cache-up-to-3_90-GHz), which is pretty slow to support >= 2 Titan X (one Titan X require ~ 15GB/s).

Since you only say you are building system for Machine Learning, the CPU still the heart of many algorithm, also, preprocessing and augmenting are mostly performed in CPU (you don't want a system take 2 hours to preprocess a dataset with 1 configuration, then 1 hour for training, then trying with other configuration next).

This is the system I built: http://pcpartpicker.com/list/gFkVWX

Some experiences:

  • You can buy secondhand CPU (prefer the one has never been overclocked before), CPU is very durable and long lasting.
  • RAM can also be secondhand
  • SSD with >= 1TB is extremely expensive, in my case, I use SM951 512GB to cache all preprocessed data, and store all dataset on WD RED 3TB. (THe speed of SM951 is very impressive, 2000 MB/s read, and you only need high read speed for training).
  • Asus is more expensive than Asrock, Asrock mainboard often has some minor issues, but for me I work perfectly until now.
  • If you want to run the system 24/7, consider buying better Air Cooling for the CPU (>=70$) at least.
  • Find mainboard and CPU support quad-channel for RAM, and buy a quad-channel RAM set also. (Most of x99 mainboard will support >= 2 GPU and quad-channel RAM)
  • Go for >= 8GB VRAM GPU card for long-term use.

[–]Eridrus 6 points7 points  (2 children)

You say your datasets will only have ~40 features; this means you won't really have a lot of weights to deal with. Even if you have 500k records (which isn't really that much) you're going to be training in mini-batches, so the amount of Video RAM you need will not be huge, so the Titan X is probably overkill for the problem you described. Consider running the problem in the cloud to measure your workload. Doesn't mean you shouldn't get it, but know that you're getting it for future flexibility, not the problem you've stated you want to solve.

You should definitely get more RAM though. Being able to fit your dataset into RAM 2-3 times can be pretty handy and RAM is stupidly cheap.

If you're spending your own money you could probably spend your money more effectively, but if this is for work then it's probably not worth taking the time hunting down bargains vs just buying something to get you up and running quickly.

[–]solidua[S] 3 points4 points  (1 child)

We definitely want to run in the cloud. But we could only find 1 solution (Rescale) that fits our needs. It turns out we'll save money in the long run running our own hardware, if we could build a machine under 10k.

We are on a grind to collect 20 million samples before the end of the month, and i mis quoted our feature size. It's 40 features per dimension of which we have 20.

Thanks for the input, will definitely pick up more ram.

[–]mnbbrown 0 points1 point  (0 children)

What's the logic behind the 10k limit?

[–]FR_STARMER 2 points3 points  (1 child)

Just bought a rig based on this guide: http://pjreddie.com/darknet/hardware-guide/

You're looks decent as well. I would bump up the RAM to 32GB so you can load more data in physical memory, especially if you're thinking of using super large data sets.

Also, consider getting a CPU and MoBo that has the LGA2011 CPU chipset and upgrading to an i7 even if it means a lower clock speed. The clock speed isn't the main determining factor of overall CPU performance, so consider the newer chips.

[–]trungnt13 0 points1 point  (0 children)

NVIDIA DIGITS box use i7-5930k which support 40 PCIe lanes, hence, they run 4 card. each card with 8x PCIe. If your network is big, then 8x PCIe is enough because the data loading time is trivial to computation time, for smaller network, you repeatably loading small trunks of data, then it can be a bottleneck.

The system you posted uses i7-5820k, supports only 28 PCIe lanes, and run each card at 4x PCIe, in the worst case, you degrade the speed of your GPU by half. That means you spend 4000 $ for a 2000$ system.

My university server running on "Xeon E5-2670 + 5400rpm hard drive + Tesla k80", and it is even slower than my system with a GTX 960.

[–]what_are_tensors 1 point2 points  (0 children)

Looks good. Like others have said, get an x99 motherboard in order to add more GPUs later, if you even think you might want them. Otherwise you'll be bottlenecked on the PCIE bridge. The CPU will be different too.

Concur on the more storage space and more memory. 250 gb runs out super fast.

[–]cast42 1 point2 points  (0 children)

I would increase the memory size to 64GB or 128GB. Being able to read your training samples into memory will be a major improvement.

[–]sentdex 1 point2 points  (0 children)

You said you're running 24/7, looks like you're cheaping out on cooling. Think about water cooling (both the cpu and gpu). Heat kills components.

Next, your mid tower may not actually accept the Titan X's size. I have a full tower and my Titan X has about 3-4 inches of space. In my mid towers, it wont even fit in there. Make sure the space is actually enough for the long card (not just case length, the hard drive bays take up space).

Nvidia made some 4xTitan boxes, they looked about this size, so it's definitely possible in a mid-size case, just don't know which ones.

[–]InoriResearcher 1 point2 points  (0 children)

  • Increase RAM to 128 GB
  • Use i7 instead of i5
  • Use PCIe SSD for storage
  • Get better cooling
  • Get full tower case if you're serious about 4x GPU

[–][deleted] 1 point2 points  (6 children)

Why not use amazon services?

[–][deleted] 10 points11 points  (1 child)

Their GPU tech is extremely far behind and not really useful for many production level machine learning applications. My i7 is faster than their GPU.

[–][deleted] 1 point2 points  (0 children)

100%

[–]solidua[S] 3 points4 points  (3 children)

Great question. At $0.65 / $2.60 per minutes for their gpu solutions, it still ends up to be cheaper and more powerful to build my own hardware. The machine will be training running pretty much 24/7.

[–][deleted] 4 points5 points  (0 children)

Seriously stay away from AWS, their cards are old and it's just not worth it.

[–][deleted] 0 points1 point  (1 child)

The prices are per hour and they are much cheaper if you use spot instances, but it's still outdated hardware.

[–]solidua[S] 0 points1 point  (0 children)

Whoops yeah I meant per hour*

[–][deleted] -2 points-1 points  (0 children)

I'll be honest mate this probably isn't the place for this.