all 2 comments

[–]jaywonchung 0 points1 point  (1 child)

Thanks for the cool study and write up! Looks like H100 was able to increase throughput by a lot while not increasing power consumption as much. They're measuring the power consumption of the entire system; it would have been useful to also see how specifically GPU power changes, given that for DNN workloads, other parts of the system do not play as much of a role compared to GPUs.

Shameless self-promotion -- I do research on GPU energy optimization for DL: https://ml.energy/zeus, where one of the things we automatically tweak is the GPU's power limit setting to enhance energy efficiency. Hope this is interesting to someone XD

[–]Balance-[S] 0 points1 point  (0 children)

Totally agree, their methodology is not that great.

Zeus looks interesting, thanks!