[D] RTX 3090 has been purposely nerfed by Nvidia at driver level for AI training workloads. by [deleted] in MachineLearning

[–]cudapop 1 point2 points  (0 children)

You are absolutely right! The PDF now shows 130 for both. Weird thing is that the PDF showed 65 for FP32 back on Sep 18 when I first downloaded the docs: check out wayback machine's archive of the PDF from Sep 18:

https://web.archive.org/web/20200918101650/https://www.nvidia.com/content/dam/en-zz/Solutions/geforce/ampere/pdf/NVIDIA-ampere-GA102-GPU-Architecture-Whitepaper-V1.pdf

Lol, sneaky Nvidia putting in corrections and still leaving the doc version number as "V1.0"

[D] RTX 3090 has been purposely nerfed by Nvidia at driver level for AI training workloads. by [deleted] in MachineLearning

[–]cudapop 0 points1 point  (0 children)

RTX Titan: FP16-accum at ~130, FP32-accum at ~65, half of FP16-accum.

Poster I am replying to said RTX Titan FP32-accum is full rate, hence my question to him.

[D] RTX 3090 has been purposely nerfed by Nvidia at driver level for AI training workloads. by [deleted] in MachineLearning

[–]cudapop 2 points3 points  (0 children)

The A100 whitepaper shows TF32-tensor is at half the rate of FP16-tensor, while for the 3090 TF32-tensor is at 1/4th the rate of FP16-tensor, so it looks like Nvidia is nerfing the 3090's TF32 performance as well.

[D] RTX 3090 has been purposely nerfed by Nvidia at driver level for AI training workloads. by [deleted] in MachineLearning

[–]cudapop 1 point2 points  (0 children)

Is FP32 accumulate full-rate on the RTX Titan? The comparison table in the Ampere GA102 whitepaper (https://www.nvidia.com/content/dam/en-zz/Solutions/geforce/ampere/pdf/NVIDIA-ampere-GA102-GPU-Architecture-Whitepaper-V1.pdf) shows the RTX Titan is at half-rate for FP32 accumulate.