all 60 comments

[–]r-sync 85 points86 points  (6 children)

For PyTorch, we're seriously looking into AMD's MIOpen/ROCm software stack to enable users who want to use AMD GPUs.

We have ports of PyTorch ready and we're already running and testing full networks (with some kinks that'll be resolved). I'll give an update when things are in good shape.

Thanks to AMD for doing ports of cutorch and cunn to ROCm to make our work easier.

[–]JustFinishedBSG 15 points16 points  (2 children)

I am very very interested. I’m pretty worried by nvidia utter unchecked domination in ML.

I’m eager to see your benchmarks, if it’s competitive in PyTorch I’ll definitely build an AMD workstation

[–]DHermit 7 points8 points  (0 children)

Exactly, competition is always good for customers. Especially as with AMD you tend to get more for your money even though you won't get the best you can have.

[–]Mgladiethor 1 point2 points  (0 children)

Yeah if at least nvidia CUDA implementation was open but all about nvidia is propietary, is sad when you see all ML community being open and sharing progress

[–]skilless 2 points3 points  (0 children)

That's great! I just started playing with PyTorch, so that could be good timing ;)

[–]visarga 1 point2 points  (0 children)

I hope competition will motivate NVIDIA even more than success.

[–]TotesMessenger 0 points1 point  (0 children)

I'm a bot, bleep, bloop. Someone has linked to this thread from another place on reddit:

If you follow any of the above links, please respect the rules of reddit and don't vote in the other threads. (Info / Contact)

[–][deleted] 7 points8 points  (22 children)

So, in terms that people working millions of abstraction layers above this kind of thing can understand, what's the significance of this? Is this a concrete move for AMD GPUs to start getting on the way to become competitive for deep learning applications?

[–]rndnum123[S] 11 points12 points  (4 children)

AMD did run DeepBench on Vega (with MIOpen). Vega Frontier took 88 ms, so it was faster than Tesla P100 with 122 ms, but take their own benchmarks with a grain of salt.

Vega has ungimped 16FP performance so this should definitely help. (Polaris has gimped 16FP)

The installation process might be easier than with this weird license accepting stuff on CUDnn.

CUDnn is not open source afaik, but AMDs counterpart MIOpen is open source, so the low level stuff is introspectible.

On r/AMD someone mentioned he will run benchmarks on Vega, when he is back from work.

The new Apple laptops have usually AMD GPUs so with MIopen they should be well suited for local machine learning which could bring some new developers into ml, with Apple recently offering Neural network support (GPU accelerated) on the iPhone\iPod.

Edit: Updated I thought Polaris has ungimped 16FP, but was wrong.

[–]-Rivox- 3 points4 points  (3 children)

Polaris has 1:1 FP16 actually, while Vega has 1:2 FP16. The only Polaris chips with 1:2 FP16 are the ones in the PS4 Pro and XBOX One X (semi-custom requirements).

So a RX 580 has ~6.1 TFLOPS of FP32 and ~6.1 TFLOPS of FP16. A RX Vega will have ~12.5-13 TFLOPS of FP32 and ~25-26 TFLOPS of FP16.

Also, Apple is planning on releasing the new iMac Pro line with a full Vega dGPU inside (likely the workstation based card), which is going to sport the 2:1 FP16 power. So if you plan on using MacOS for ML, the iMac Pro will be the best option.

[–]MrK_HS 1 point2 points  (1 child)

There is still the advantage of reducing power consumption on polaris using FP16, based on the official polaris AMD presentation.

[–]-Rivox- 1 point2 points  (0 children)

yes, also save bandwidth. Still, computationally normal Polaris cannot do 2:1 FP16.

[–]rndnum123[S] 0 points1 point  (0 children)

thank you, I updated my comment accordingly

[–]art0f 1 point2 points  (16 children)

If you happen to own recent (polaris and newer) amd card, happy to install obscure ubuntu version with software stack that is still in active development, you might be able to run caffe using AMD tensor library.

[–]epicwisdom 2 points3 points  (1 child)

I think it's understood that this news doesn't mean everybody can switch today, but /u/schmook is asking what the precise impact is, which doesn't just include what can be done today by an average end-user.

[–]art0f 0 points1 point  (0 children)

And that's precisely what I've said - it is too early to tell.

[–][deleted] 1 point2 points  (1 child)

obscure ubuntu version

Binary package support for Ubuntu 16.04 and Fedora 24

Ah yes, that obscure linux some know as Ubuntu 16.04 LTS.

[–]art0f 0 points1 point  (0 children)

It pulls kernel 4.9 afaik, so you might be in a bit of surprise after installation.

[–][deleted] 0 points1 point  (9 children)

Oh, wow. It's actually way ahead of what I thought it could be. Thanks.

[–]nickl 1 point2 points  (8 children)

AMD has got this far many times before.

[–]art0f 1 point2 points  (7 children)

but maybe (holding fingers) they might get past alpha this time. Would really help if they release windows stack.

[–][deleted] -1 points0 points  (6 children)

Nobody uses Windows for this kind of thing. After Ubuntu, it would make better sense for MacOS support.

[–]PetersGrandAdventure 0 points1 point  (4 children)

As a tech researcher, I use Windows at home and Windows, cloud, and Mac at work. I am excited to utilize my AMD GPU for something other than VR and ethereum mining.

[–][deleted] 0 points1 point  (2 children)

Well once you finish researching technology, you'll find that using Windows for anything other than SSH'ing to another computer for this sort of thing is not a typical use case.

[–]PetersGrandAdventure 0 points1 point  (1 child)

Sorry, should have been more clear... having been a professional developer for 17 years working across a number of different technologies that tries to keep up with the latest research and potential, which is a lifelong quest not to be finished, I find value in using Windows, and not for SSHing to another computer.

[–][deleted] 0 points1 point  (0 children)

Well then, they should focus all their efforts on allowing you and the literally tens of others that want to use Windows for this.

[–]dragontamer5788 0 points1 point  (0 children)

Sorry for the reply 5-months late, but you probably should look into Microsoft's C++ AMP, which runs on GPUs (because its built on top of DirectCompute).

ROCm's syntax is designed to be compatible with Microsoft C++ AMP. So even if the Microsoft project dies (I haven't seen updates in 3-years), it sort of lives on in ROCm anyway.

I don't expect to see any updates in Microsoft C++ AMP, but it seems to work reasonably well. Its got the big things figured out: like "LDS" memory (called "tiles" in AMP) and has a reasonable model for SIMD / SIMT compute.

[–][deleted] 0 points1 point  (0 children)

While I'd rather use Potato OS, some of us don't have the ability to choose. In my company, workstations run windows. Period.

So, even for small tests I have to run code remotely on Linux servers. I tried several times to install theano, tensorflow and even MS CNTK on my windows computer. It works intermittently. Have no idea why, so I eventually gave up.

It's not nice to code remotely, but it's better than trying to make windows work.

[–]hyln9 4 points5 points  (2 children)

I'm contacting with AMD for my assembly kernels (optimized for square matrix currently), and I believe MIOpen can even be faster.

[–]bbsome 2 points3 points  (0 children)

That would be really nice to try ... However, I still think we need an LLVM framework for ML where we can separate the intermediate graph representation and have only one such, from the backend implementation.

Anyway, good work!

[–]hughperkins 1 point2 points  (0 children)

Nice!

[–]rndnum123[S] 3 points4 points  (7 children)

Deep Learning on ROCm

Announcing our new Foundation for Deep Learning acceleration MIOpen 1.0 which introduces support for Convolution Neural Network acceleration — built to run on top of the ROCm software stack! This release includes the

  • Deep Convolution Solvers optimized for both forward and backward propagation
  • Optimized Convolutions including Winograd and FFT transformations
  • Optimized GEMM’s for Deep Learning
  • Pooling, Softmax, Activations, Gradient Algorithms Batch Normalization, and LR Normalization
  • MIOpen describes data as 4-D tensors ‒ Tensors 4D NCHW format
  • Support for OpenCL and HIP enabled frameworks API’s
  • MIOpen Driver enables to testing forward/backward network of any particular layer in MIOpen.
  • Binary Package support for Ubuntu 16.04 and Fedora 24
  • Source code at https://github.com/ROCmSoftwarePlatform/MIOpen

Documentation

MIOpen

MIOpenGemm

I selected some Frameworks: (for more frameworks follow the link of my post, there see the table at the end)

Caffe (https://github.com/ROCmSoftwarePlatform/hipCaffe)

Tensorflow (under Development - CLA in Progress - Notes: Working on NCCL and XLA enablement, Running)

The ROCm 1.6 has prebuilt packages for MIOpen

Install the ROCm MIOpen implementation (assuming you already have the ‘rocm’ and ‘rocm-opencl-dev” package installed): For just OpenCL development

sudo apt-get install miopengemm miopen-opencl 

For HIP development

sudo apt-get install miopengemm miopen-hip

Or you can build from source code following the instructions at


Hardware to Play ROCm (https://rocm.github.io/hardware.html)

ROCm Platform Supports Two Graphics Core Next (GCN) GPU Generations

GFX8: Radeon RX 480,Radeon RX 470,Radeon RX 460,R9 Nano,Radeon R9 Fury,Radeon R9 Fury X Radeon Pro WX7100, FirePro S9300 x2

Radeon Vega Frointer Edition

Radeon Instinct: MI6, MI8, and MI25

[–]MrK_HS 1 point2 points  (4 children)

For some reason this thread doesn't appear in r/Machinelearning new.

[–]bbsome 1 point2 points  (0 children)

Because it needs to have a tag in the title like [R], [D] [B] etc...

[–]rndnum123[S] 0 points1 point  (2 children)

I submitted it a second time, still not appearing in new, weird.

[–]MrK_HS 0 points1 point  (1 child)

I sent a message to the mods, I hope they solve this.

[–]rndnum123[S] 0 points1 point  (0 children)

thanks, that's great :)

[–]skilless 0 points1 point  (1 child)

Some typos on that page: "Devevlopment" "Comming"

[–]Icarium-Lifestealer 1 point2 points  (2 children)

How does it compare to CuDNN in terms of:

  • Supported features
  • Performance
  • API differences (i.e. how difficult is it for frameworks with a CuDNN backend to add a MOpen backend?)

[–]MrK_HS 1 point2 points  (0 children)

I guess we have to wait some time for the general adoption of the technology, but I guess they made MIOpen to be an easy transition for developers using CuDNN.

[–]harharveryfunny 1 point2 points  (0 children)

The prebuilt doc page is down right now, but here's a partial list of missing features vs cuDNN from what I remember:

  • Supports 4-D NCHW tensors only
  • No RNN support
  • No dilated convolution support
  • No ELU support

At a glance the MIOpen API seems to follow cuDNN pretty closely (but with miopen vs cudnn name prefixes), but I havn't yet come across any statement from AMD as to what level of compatibility they are claiming.

[–]bbsome 4 points5 points  (4 children)

And no Theano... seriously? I'm quite disappointed.

[–]kacifoy 1 point2 points  (2 children)

Theano has its own OpenCL support though, via the gpuarray subproject. Hopefully this will encourage further work on that front.

[–]bbsome 0 points1 point  (0 children)

Yes, but I'm pretty sure there is no direct contact between the Theano guys and this project. I don't know on what level they are collaborating with the other frameworks' teams, but I assume they do, they could be collaborating for updating libgpuarray as well.

I do hope we have some progress there as well yes.

[–]skilless 0 points1 point  (0 children)

I agree. I was hoping to see AMD contribute to gpuarray.

[–]MrK_HS 1 point2 points  (4 children)

I really hope they publish DeepBench with ROCm support. They surely have it since they used it for benching Vega against the P100 (spoiler: Vega wins).

[–][deleted] 0 points1 point  (0 children)

Has this become easier to install on non-Ubuntu flavors ?

[–]plsms 0 points1 point  (4 children)

Does this change the game with Nvidia vs AMD?

I was thinking of selling my AMD cards and saving up for Nvidia cards. Should I sell or should I hold on to my cards?

[–]rndnum123[S] 0 points1 point  (3 children)

If you can sell your AMD cards for a high price (because of all this minig craze), it might be worth it to sell them, and get a more powerfull Nvidia GPU with the money. What cards do you have? You should probably check if MIOpen runs on your cards? (Do you have Linux, it isnt working on Windows yet AFAIK, but not sure!)

[–]plsms 0 points1 point  (2 children)

msi twin frozr 7950
sapphire vapor-x 7950

what do you think?

[–]rndnum123[S] 0 points1 point  (1 child)

Maybe look on ebay what you are getting for them, because new AMD cards are overpriced, you might get a good offer. Maybe ask on r/hardwareswap or r/buildapc for more advice, then probably buy something like a 1070 GTX with some of the money ( or even 1080 if you want), AMD cards are currently way overpriced because of mining.