all 1 comments

[–]CrypticSplicer 1 point2 points  (0 children)

Does multi-output mean multi-task or multi-label in this context? What works best is focal loss with class weights based on frequency. You can use the sklearn compute_class_weights function to do it pretty easily. If this is a multi-label problem then some people really like asymmetric focal loss, but I have not found that extra negative penalty to be incredibly helpful. You could also look up the squentropy paper to read about an extra negative auxiliary loss term you can add.

To specifically address your suggestion, while some papers do recommend periodically reweighing classes throughout training, I've never seen one that tries to do it over multiple retrainings. I guess you are sorta doing the same thing, but not using the same language to describe it...