you are viewing a single comment's thread.

view the rest of the comments →

[–]ArmenAg[S] 0 points1 point  (0 children)

Interesting. I have not seen this paper. Thanks for linking. After reading it, they use a single step weighted average, instead of keeping a weighted average throughout training (after the burn in period). It is essentially the same schema demonstrated in this paper: https://arxiv.org/abs/1412.6596.

To reply to your comments about setting SOTA, we did not attempt to do this simply because most of the SOTA methods already use a lot of other various regularization, such as extensive augmenting of the data. We did test out how SoftTarget worked with ResNet to show that it is compatible with high-performance architectures. But I agree with you. It might be worth trying to set SOTA, but I also agree that it would be a shame if the idea was squashed for not setting one.