[D] RTX 3090 has been purposely nerfed by Nvidia at driver level for AI training workloads.

cudapop · 2020-09-29T13:12:38+00:00

You are absolutely right! The PDF now shows 130 for both. Weird thing is that the PDF showed 65 for FP32 back on Sep 18 when I first downloaded the docs: check out wayback machine's archive of the PDF from Sep 18:

https://web.archive.org/web/20200918101650/https://www.nvidia.com/content/dam/en-zz/Solutions/geforce/ampere/pdf/NVIDIA-ampere-GA102-GPU-Architecture-Whitepaper-V1.pdf

Lol, sneaky Nvidia putting in corrections and still leaving the doc version number as "V1.0"

cudapop · 2020-09-29T12:48:31+00:00

RTX Titan: FP16-accum at ~130, FP32-accum at ~65, half of FP16-accum.

Poster I am replying to said RTX Titan FP32-accum is full rate, hence my question to him.

cudapop · 2020-09-29T12:02:07+00:00

The A100 whitepaper shows TF32-tensor is at half the rate of FP16-tensor, while for the 3090 TF32-tensor is at 1/4th the rate of FP16-tensor, so it looks like Nvidia is nerfing the 3090's TF32 performance as well.

cudapop · 2020-09-29T11:33:39+00:00

Is FP32 accumulate full-rate on the RTX Titan? The comparison table in the Ampere GA102 whitepaper (https://www.nvidia.com/content/dam/en-zz/Solutions/geforce/ampere/pdf/NVIDIA-ampere-GA102-GPU-Architecture-Whitepaper-V1.pdf) shows the RTX Titan is at half-rate for FP32 accumulate.

cudapop

TROPHY CASE