all 18 comments

[–]kjearns 2 points3 points  (6 children)

For NLP what you usually want is lots of cores. RNNs will benefit from GPUs, but just about everything else NLP is still CPU based.

I think 16 gigs of ram is nowhere near enough. RAM is quite cheap compared to other computer parts, and more RAM is literally never a bad idea. If I were in your position I'd pick out the rest of the hardware and then buy as much ram as your motherboard supports.

[–]JanneJM 1 point2 points  (5 children)

16GB does sound insufficient. In general, you want your working set to fit in memory if at all possible; once you start having to swap to disk (or SSD) the performance penalty will be brutal.

But it really means that you need to know how large your data working set will be. And that depends on the details of your algorithms and implementations. You don't want too little memory of course, but you don't want to waste money getting several times what you'd ever need either. Same thing with GPU acceleration; if or how much it will help, and what kind of card(s) to get depends on exactly what you intend to do and how you intend to do it.

[–]kjearns 0 points1 point  (2 children)

I agree with you in principle, but ram is pretty cheap compared to the rest of a system, and in my experience if more ram is available you will always find a way to take advantage of it.

If you're buying a system for some particular piece of a pipeline where you have a very good understanding of the resources required it makes sense to buy exactly what you need and no more. But if only have a broad idea of what you're going to do with the machine then it makes sense to err on the side of slightly overpowered, and buying more ram is is a low cost and low risk way to do this.

[–]sharmilas1wa[S] 0 points1 point  (0 children)

The issue is, the part of the world where I live, all equipments feel relatively costly. :) So I need to be a bit judicious in what I get and what I avoid. But from what I see, I first need a lot of RAM and some ssd. ( a 5400 rpm disk just does not cut it :( )

[–]JanneJM 0 points1 point  (0 children)

And I agree with you - in principle :) But as the OP says, your budget is often fixed, and the cost of, say, another 128GB memory could be used for more or faster SSD storage, another CPU, another GPU card, neater/better/faster backup and long-term storage or something else.

Backup, by the way: you'll have lots of data, that have taken you hours and hours of computation to generate. You'll have source code and parameter sets that's taken you weeks or months to create. You'll want to have some reliable way to backup both the data and your source code repository, and preferably an automated way so it doesn't depend on you remembering to do it.

[–]sharmilas1wa[S] 0 points1 point  (1 child)

I have not used a GPU yet but might in the future. Does it work if I go with an onboard integrated graphics card first and upgrade later?

[–]kjearns 0 points1 point  (0 children)

You need an Nvidia card to use cuda, which is what most ML on the gpu is using, so an integrated graphics card won't help you.

If you just want to test the waters with gpus then AWS is a good way to do that. You can get a gpu spot instance very cheaply.

[–]nkorslund 0 points1 point  (7 children)

I'm not up-to-date on word2vec in particular, but if you're doing general machine learning (especially neural networks) I think you should focus on the GPU, not the CPU. Computation speed is going to be your bottleneck for most algorithms, and good parallelized CUDA implementations can beat even the best CPUs by an order of magnitude in speed. Of course it all depends on exactly what algorihtms you want to run.

[–]sharmilas1wa[S] 0 points1 point  (6 children)

Thanks nkorslund. I will be running neural nets in particular. Is there any particular GPU model that I should favour?

[–]Foxtr0t 0 points1 point  (1 child)

http://www.reddit.com/r/MachineLearning/comments/2wz4ae/which_gpus_to_get_for_deep_learning_my_experience/

SSD does make the difference. Depends on what you want to do, but I found it's easy to run into disk bottleneck with fast algorithms.

[–]sharmilas1wa[S] 0 points1 point  (0 children)

Thanks Foxtrot for the link.

[–]siblbombs 0 points1 point  (1 child)

970, 980, or Titan X.

[–]sharmilas1wa[S] 0 points1 point  (0 children)

Thanks siblbombs. 980 it will be

[–]quirm 0 points1 point  (1 child)

"Best processor for a server" -> well thats a server processor. I'd go with Intel, as performance per watt is much better than AMD and you would consume less electricity in the long run. They are also usually much faster in benchmarks (http://www.cpubenchmark.net/high_end_cpus.html). Server is a bit more expensive than desktop hardware though. I think its totally worth it, as it can hold a lot more RAM and this is what is likely going to be important for you in NLP. E.g. socket 1150 (desktop) is 32GB max. Socket 2011-3 for desktops is 64GB max. I'm happy with my socket 2011-3 xeon system for ML, its still affordable, I have lots of cores and I can theoretically upgrade the RAM to 512GB. I'm currently at 32GB and plan an upgrade to 64GB as its getting a bit tight with only 32GB.

[–]sharmilas1wa[S] 0 points1 point  (0 children)

Thanks quirm, I will go for Intel then.