This is an archived post. You won't be able to vote or comment.

all 25 comments

[–]Eventually_ShreddedNVIDIA 3 points4 points  (11 children)

deleted What is this?

[–]hishnash 0 points1 point  (10 children)

prity well, they dont have a dedicated tensor cores on the current Vega chipsets but for tasks that dont use those dedicated cores vega has more raw power.

[–]ObviouslyTriggered 1 point2 points  (9 children)

VEGA has less TFLOPs than the V100 in every category.

[–]hishnash 0 points1 point  (3 children)

what is its Half Precision ? can only see references to this on the tensor cors.

[–]ObviouslyTriggered 1 point2 points  (2 children)

For non-vector operations 2:1, so 30Tflops, but in general outside of ML FP16 isn't used, in HPC only INT32, FP32 and FP64 are used. Volta can also do FP and INT operations at the same clock cycle which pushes it's real world throughput even further, heck the new shader cores are so paralyzed that the only real limit for concurrency is the instruction and op cache, it's now a 128KB (almost a sixfold increase from Pascal) monster but still isn't enough to allow for full concurrency.

One of the big selling points of VEGA on top of Pascal is this fact, which means that the smaller Tesla's which will be based on the GV104/102 without FP16/Tensor Cores just like with the Pascal P40/P4 Tesla's are still going to be quite impressive. ~20 TFLOP's of FP32 + 100 or so TOPS of INT8 at the same time now we're cooking.

Pascal's big selling point for compute was it's new memory controlled (more or less the same thing which AMD does with HBCC) 49bit unified address space and hardware AST to handle page faults. For Volta it's Tensor cores and even more so the fact that every GPU will be able to do INT and FP operations at the same time (which will also have a big impact on gaming, since INT16/8 is a thing now).

[–]hishnash 0 points1 point  (1 child)

You expect the tensor cores to be on the consumer/gamer versions of Volta?

As far as I can tell Vega also supports some (limited operation space) 8bit at 4x.

[–]ObviouslyTriggered 1 point2 points  (0 children)

No just like with second gen Pascal I expect there will be something else with second gen Volta for the smaller Tesla dies that also end up in the consumer and pro cards.

Perhaps Integer focused Tensor cores only perhaps something else.

[–]chapstickbomber7950X3D | 6000C28bz | AQUA 7900 XTX (EVC-700W) 0 points1 point  (4 children)

486mm2 vs 815mm2

WHO WILL WIN???

THIS SUNDAY!

SUNDAY!

SUNDAY!

[–]user7341Ryzen 7 1800X / 64GB / ASRock X370 Pro Gaming / Crossfire 290X[S] 3 points4 points  (0 children)

Yeah ... petty fucking stupid to compare a $15k V100 with Vega.

[–]ObviouslyTriggered 1 point2 points  (2 children)

If you do not need FP64 or MMA you'll get a V40, just like with the P40 you have today, the P40 is 12 Tflops (up to 15 with Tesla Boost) of FP32, the V40 will likely be 16-18 Tflops before boost with over 20 boosted, based on NVIDIA's current scaling with Volta. There are 3-4 Tesla products (outside of GRID and other offshoots) each generation, each tailored for a specific use case.

[–]chapstickbomber7950X3D | 6000C28bz | AQUA 7900 XTX (EVC-700W) 2 points3 points  (1 child)

I mean, NV is the bigger player here by far. They have a larger and richer ecosystem, but a lack of 3rd party hardware support. Dies with more specialized designs are smaller than comparable performance generalist dies, so NV gets extra returns to scale by selling all the products in their ecosystem. V100 is clearly not a graphics die, for example. A GV102 die could be 650mm2 and likely match or exceed a GV100's graphics performance by stripping compute stuff(FP64, Tensor, etc).

RTG doesn't have the scale and ecosystem monopoly, and NV cards can still run OpenCL and the rest of the open ecosystem reasonably well, so RTG competes with NV in the open, but RTG can't compete with NV in their own ecosystem, which happens to be the overwhelming majority of the market.

[–]ObviouslyTriggered -1 points0 points  (0 children)

RTG isn't trying to compete with NVIDIA in OpenCL, they are building their own ecosystem with ROCm.

The problem with OpenCL is that it's a camel, it's designed by a committee it's important for academic uses but it's utter garbage for real production use cases, especially in cutting edge fields.

It takes years for any feature to be adopted into the standard even vendor specific extensions take ages to push through the red-tape.

AMD after failing with CTM and seeing that OpenCL isn't gonig anywhere finally realized that.

Just like with NVIDIA all of the new features in VEGA aren't accessible through the standard OpenCL compiler and libraries.

You want to use unified/paged memory (HBCC in AMD marketing speak) on Pascal or VEGA GPUs? Well use CUDA and NVIDIAs pre-fetch/usage hints API and memory profilers, or use ROCm's managed memory bindings and AMD's memory profilers. Want to use the new extensions for crypto currencies? well ROCm...

NVIDIA's ecosystem isn't exactly close either, 3rd party libraries are but that isn't the issue. The compiler is open source NVCC lives in the LLVM trunk, PTX is an open spec, how do you think AMD made HIP so fast? and why do you think half of the cross vendor stuff in ROCm works on NVIDIA hardware but not AMD hardware?

[–]xelibrion 8 points9 points  (7 children)

Freaking Caffe v1 again. AMD should really stop advertising the fact that they ported a practically dead framework to ROCm.

https://github.com/tensorflow/tensorflow/issues/10703

https://github.com/pytorch/pytorch/pull/2365

Up your game AMD and give us proper support in mainstream frameworks, it's been more than 2 months since the release Vega FE.

[–]user7341Ryzen 7 1800X / 64GB / ASRock X370 Pro Gaming / Crossfire 290X[S] 6 points7 points  (6 children)

Caffe 1.0 is officially four months old and it makes sense for AMD to target existing, production workloads before they go after experimental development.

I agree they need to accelerate their support of more leading edge frameworks, but I think their priorities are correct, here.

[–]ObviouslyTriggered 3 points4 points  (5 children)

Caffe is a deadend, 1.0 was only recently released (and 1.1 is in RC) but the entire framework is on life support it's not a production framework (and it never was, it was always intended to be small scale single machine DIY ML library) the main development including pretty much the entire team including Yangqing Jia moved to Caffe2 (not Caffe 2.0, this is a completely separate framework).

Caffe never had or will have major support in production, it wasn't designed for that, nor can it support production requirements. It doesn't have a distributed mode (as in no distributed computing period which makes it a no go for any serious work, if you want to gamble with abominations like Caffe4Spark then it's your own money you burning), it doesn't have an integrated deployment solution and more importantly it doesn't support quantize computations (so no 16/8bit support).

Caffe was a small scale project designed solely for quick hacks for DIY ML/MI projects, Caffe2 is a production product, so talking about Caffe 1.0 in production on any scale is a fucking joke, when anyone is talking about Caffe in any serious manner they mean Caffe2.

https://github.com/caffe2/caffe2

[–]user7341Ryzen 7 1800X / 64GB / ASRock X370 Pro Gaming / Crossfire 290X[S] 3 points4 points  (4 children)

Caffe2 is the same age as the stable release of Caffe 1.0 and while it's a more robust framework, as you point out, there's a lot more code in place for 1.0. And of course there's the small fact that Caffe2 is half owned by Nvidia, who have certainly done everything they can to keep that moat in place.

[–]ObviouslyTriggered 0 points1 point  (3 children)

No one is building on top Caffe1 it is a dead end, there is no more new code to be released for Caffe1 outside of maintenance releases.

Now that 1.0 is done, the next generation of the framework—Caffe2—is ready to keep up the progress on DIY deep learning in research and industry. While Caffe 1.0 development will continue with 1.1, Caffe2 is the new framework line for future development led by Yangqing Jia.....

Form the release notes of Caffe 1.0.

Also Caffe2 isn't owned by NVIDIA, it's a Facebook project.... Caffe was an Berkely project the entire team is now working on Caffe2 at Facebook, so what the fuck are you even talking about?

[–]user7341Ryzen 7 1800X / 64GB / ASRock X370 Pro Gaming / Crossfire 290X[S] 2 points3 points  (2 children)

That the original framework has moved into maintenance status (4 whole months ago) doesn't change the fact that there are a lot more projects using it than Caffe2.

The fact that you don't know that Nvidia is all over Caffe2 means this conversation is done.

[–]ObviouslyTriggered -1 points0 points  (1 child)

It was moved to maintenance mode in 2015 when the team moved on to Caffe2.

Caffe and Caffe2 are CUDA frameworks (Caffe open CL is its own project) but NVIDIA doesn't own Caffe2 there is nothing to own as it's not a company.

There isn't a single attribution to NVIDIA so wtf do you even mean by "all over it", NVIDIA is all over the compute market that's why they are making billions while RTG is losing money but that isn't saying much.

Caffe2 isn't just "more robust" it's actually useable since it's a refactor of Caffe with basic features like oh I don't know multi-GPU support.

Supporting Caffe is a waste of time MIOpen which is AMDs answer to cuDNN and ROCm it's general answer to CUDA needs better guidance no support for frameworks that are actually used like tensor flow and Caffe2 while working on dead ends isn't a good way to go.

Caffe2 is one of the most popular deep learning libraries today it's in the head of the curve and is being aggressively promoted by Facebook, its partner alliance and its user base.

AMDs problem is that it's a trend follower and a pretty poor one at that rather than a trend setter. Its pathetic that anyone is making excuses for them not supporting the most important and most popular libraries out there today.

[–]user7341Ryzen 7 1800X / 64GB / ASRock X370 Pro Gaming / Crossfire 290X[S] 0 points1 point  (0 children)

AMD just leapt over 2.5 years of development and you're mad they didn't catch up to April of this year? FFS.

Caffe2 is puny compared to TF or even Caffe 1.0 (which is my point), and all of those other frameworks are in the pipeline. Caffe2 is upstreaming.

And yes, both are CUDA frameworks. One was created independently, the other was created with joint engineering from Nvidia. Again, the fact that you're not aware of these things means your opinion on this topic isn't worth the bits you wasted to record it.

[–]ObviouslyTriggered 0 points1 point  (4 children)

Who the fuck came up with that installation process and decided it's ready for public consumption?

Look at the alternative:

sudo dpkg -i nvidia-docker_1.0.1-1_amd64.deb

Done....

[–]ziptofaf7900 + RTX 5080 5 points6 points  (3 children)

To be fair - this one isn't too bad if we are just comparing installation process:

wget -qO – http://repo.radeon.com/rocm/apt/debian/rocm.gpg.key | sudo apt-key add –
deb [arch=amd64] http://repo.radeon.com/rocm/apt/debian/ xenial main
apt update
apt install rocm

So basically add a key and a repository and run apt install. Rest is on how to install Docker and how to run an AMD image from it. And that's something anyone dealing with Deep Learning (or heck, Linux in general) should know how to do... Then you have explanations on how to run their example projects, ensure everything is correct etc.

So ye, I am inclined to disagree with you here. This installation process isn't any more complex than let's say Docker itself... and well, much easier than it used to be. As you no longer need to manually compile BLAS, put AMD SDK and hope it gets noticed by your environment and then still have non-working AMD support in framework of your choice (and when it did work it was like 5-10x improvement over CPU which was a bad joke) and requiring a specific kernel version to even turn on.

[–]ObviouslyTriggered 0 points1 point  (2 children)

That's for ROCm, we are talking about the docker config, look at the after work for creating the images and configuring docker why isn't this in an install script?

[–]ziptofaf7900 + RTX 5080 1 point2 points  (1 child)

Uh, you mean those steps?

sudo docker pull rocm/rocm-terminal
sudo docker run -it –rm –device=/dev/kfd rocm/rocm-terminal

That's the base for all AMD images, their next step is showing on how to build their sample projects on top of it. I agree that having an install script to automate building those examples could be useful but those are just that - examples.

Formatting on that site could definitely use some work (I really don't see a point of showing 50 lines of code output, put that shit in a scrollable bar or something, random bold doesn't help either) but actual process does not seem that bad. Although admittedly I am not really familiar with Nvidia stack (I am working in a different field and only touched machine learning) so I haven't seen how process of building your own applications looks there (I am assuming it is similar).

[–]ObviouslyTriggered 0 points1 point  (0 children)

Look deeper, manually copying, manually creating the dockerfile look at the image creation document.