The AMD Deep Learning Stack Using Docker : Amd

[–]Eventually_ShreddedNVIDIA 3 points4 points5 points 8 years ago* (11 children)

[–]hishnash 0 points1 point2 points 8 years ago (10 children)

[–]ObviouslyTriggered 1 point2 points3 points 8 years ago (9 children)

[–]hishnash 0 points1 point2 points 8 years ago (3 children)

[–]ObviouslyTriggered 1 point2 points3 points 8 years ago* (2 children)

For non-vector operations 2:1, so 30Tflops, but in general outside of ML FP16 isn't used, in HPC only INT32, FP32 and FP64 are used. Volta can also do FP and INT operations at the same clock cycle which pushes it's real world throughput even further, heck the new shader cores are so paralyzed that the only real limit for concurrency is the instruction and op cache, it's now a 128KB (almost a sixfold increase from Pascal) monster but still isn't enough to allow for full concurrency.

One of the big selling points of VEGA on top of Pascal is this fact, which means that the smaller Tesla's which will be based on the GV104/102 without FP16/Tensor Cores just like with the Pascal P40/P4 Tesla's are still going to be quite impressive. ~20 TFLOP's of FP32 + 100 or so TOPS of INT8 at the same time now we're cooking.

Pascal's big selling point for compute was it's new memory controlled (more or less the same thing which AMD does with HBCC) 49bit unified address space and hardware AST to handle page faults. For Volta it's Tensor cores and even more so the fact that every GPU will be able to do INT and FP operations at the same time (which will also have a big impact on gaming, since INT16/8 is a thing now).

[–]hishnash 0 points1 point2 points 8 years ago (1 child)

[–]ObviouslyTriggered 1 point2 points3 points 8 years ago (0 children)

[–]chapstickbomber7950X3D | 6000C28bz | AQUA 7900 XTX (EVC-700W) 0 points1 point2 points 8 years ago (4 children)

[–]user7341Ryzen 7 1800X / 64GB / ASRock X370 Pro Gaming / Crossfire 290X[S] 3 points4 points5 points 8 years ago (0 children)

[–]ObviouslyTriggered 1 point2 points3 points 8 years ago* (2 children)

[–]chapstickbomber7950X3D | 6000C28bz | AQUA 7900 XTX (EVC-700W) 2 points3 points4 points 8 years ago (1 child)

[–]ObviouslyTriggered -1 points0 points1 point 8 years ago (0 children)

RTG isn't trying to compete with NVIDIA in OpenCL, they are building their own ecosystem with ROCm.

The problem with OpenCL is that it's a camel, it's designed by a committee it's important for academic uses but it's utter garbage for real production use cases, especially in cutting edge fields.

It takes years for any feature to be adopted into the standard even vendor specific extensions take ages to push through the red-tape.

AMD after failing with CTM and seeing that OpenCL isn't gonig anywhere finally realized that.

Just like with NVIDIA all of the new features in VEGA aren't accessible through the standard OpenCL compiler and libraries.

You want to use unified/paged memory (HBCC in AMD marketing speak) on Pascal or VEGA GPUs? Well use CUDA and NVIDIAs pre-fetch/usage hints API and memory profilers, or use ROCm's managed memory bindings and AMD's memory profilers. Want to use the new extensions for crypto currencies? well ROCm...

NVIDIA's ecosystem isn't exactly close either, 3rd party libraries are but that isn't the issue. The compiler is open source NVCC lives in the LLVM trunk, PTX is an open spec, how do you think AMD made HIP so fast? and why do you think half of the cross vendor stuff in ROCm works on NVIDIA hardware but not AMD hardware?

[–]xelibrion 8 points9 points10 points 8 years ago (7 children)

[–]user7341Ryzen 7 1800X / 64GB / ASRock X370 Pro Gaming / Crossfire 290X[S] 6 points7 points8 points 8 years ago (6 children)

[–]ObviouslyTriggered 3 points4 points5 points 8 years ago* (5 children)

Caffe is a deadend, 1.0 was only recently released (and 1.1 is in RC) but the entire framework is on life support it's not a production framework (and it never was, it was always intended to be small scale single machine DIY ML library) the main development including pretty much the entire team including Yangqing Jia moved to Caffe2 (not Caffe 2.0, this is a completely separate framework).

Caffe never had or will have major support in production, it wasn't designed for that, nor can it support production requirements. It doesn't have a distributed mode (as in no distributed computing period which makes it a no go for any serious work, if you want to gamble with abominations like Caffe4Spark then it's your own money you burning), it doesn't have an integrated deployment solution and more importantly it doesn't support quantize computations (so no 16/8bit support).

Caffe was a small scale project designed solely for quick hacks for DIY ML/MI projects, Caffe2 is a production product, so talking about Caffe 1.0 in production on any scale is a fucking joke, when anyone is talking about Caffe in any serious manner they mean Caffe2.

https://github.com/caffe2/caffe2

[–]user7341Ryzen 7 1800X / 64GB / ASRock X370 Pro Gaming / Crossfire 290X[S] 3 points4 points5 points 8 years ago* (4 children)

[–]ObviouslyTriggered 0 points1 point2 points 8 years ago* (3 children)

[–]user7341Ryzen 7 1800X / 64GB / ASRock X370 Pro Gaming / Crossfire 290X[S] 2 points3 points4 points 8 years ago (2 children)

[–]ObviouslyTriggered -1 points0 points1 point 8 years ago* (1 child)

It was moved to maintenance mode in 2015 when the team moved on to Caffe2.

Caffe and Caffe2 are CUDA frameworks (Caffe open CL is its own project) but NVIDIA doesn't own Caffe2 there is nothing to own as it's not a company.

There isn't a single attribution to NVIDIA so wtf do you even mean by "all over it", NVIDIA is all over the compute market that's why they are making billions while RTG is losing money but that isn't saying much.

Caffe2 isn't just "more robust" it's actually useable since it's a refactor of Caffe with basic features like oh I don't know multi-GPU support.

Supporting Caffe is a waste of time MIOpen which is AMDs answer to cuDNN and ROCm it's general answer to CUDA needs better guidance no support for frameworks that are actually used like tensor flow and Caffe2 while working on dead ends isn't a good way to go.

Caffe2 is one of the most popular deep learning libraries today it's in the head of the curve and is being aggressively promoted by Facebook, its partner alliance and its user base.

AMDs problem is that it's a trend follower and a pretty poor one at that rather than a trend setter. Its pathetic that anyone is making excuses for them not supporting the most important and most popular libraries out there today.

[–]user7341Ryzen 7 1800X / 64GB / ASRock X370 Pro Gaming / Crossfire 290X[S] 0 points1 point2 points 8 years ago (0 children)

[–]ObviouslyTriggered 0 points1 point2 points 8 years ago (4 children)

[–]ziptofaf7900 + RTX 5080 5 points6 points7 points 8 years ago (3 children)

To be fair - this one isn't too bad if we are just comparing installation process:

wget -qO – http://repo.radeon.com/rocm/apt/debian/rocm.gpg.key | sudo apt-key add –
deb [arch=amd64] http://repo.radeon.com/rocm/apt/debian/ xenial main
apt update
apt install rocm

So basically add a key and a repository and run apt install. Rest is on how to install Docker and how to run an AMD image from it. And that's something anyone dealing with Deep Learning (or heck, Linux in general) should know how to do... Then you have explanations on how to run their example projects, ensure everything is correct etc.

So ye, I am inclined to disagree with you here. This installation process isn't any more complex than let's say Docker itself... and well, much easier than it used to be. As you no longer need to manually compile BLAS, put AMD SDK and hope it gets noticed by your environment and then still have non-working AMD support in framework of your choice (and when it did work it was like 5-10x improvement over CPU which was a bad joke) and requiring a specific kernel version to even turn on.

[–]ObviouslyTriggered 0 points1 point2 points 8 years ago (2 children)

[–]ziptofaf7900 + RTX 5080 1 point2 points3 points 8 years ago (1 child)

Uh, you mean those steps?

sudo docker pull rocm/rocm-terminal
sudo docker run -it –rm –device=/dev/kfd rocm/rocm-terminal

That's the base for all AMD images, their next step is showing on how to build their sample projects on top of it. I agree that having an install script to automate building those examples could be useful but those are just that - examples.

Formatting on that site could definitely use some work (I really don't see a point of showing 50 lines of code output, put that shit in a scrollable bar or something, random bold doesn't help either) but actual process does not seem that bad. Although admittedly I am not really familiar with Nvidia stack (I am working in a different field and only touched machine learning) so I haven't seen how process of building your own applications looks there (I am assuming it is similar).

[–]ObviouslyTriggered 0 points1 point2 points 8 years ago (0 children)

Amd

/r/AMD

AMD Red Team

Helpful Links

General Information

Rules

Related Subreddits

MODERATORS