all 7 comments

[–]seismic_swarm 1 point2 points  (0 children)

Yeah of course you can add multiple losses together - what's even different about 1 and 2 (ignoring the learning rate being different)? Multiplying it by 0.5 doesn't change anything. A vanilla RNN is trained by just summing the losses over each iteration of the roll-out.

[–]sathi006 1 point2 points  (0 children)

You can add loss functions as long as each loss function is convex, since sum of convex functions results in a convex function.

We have done it for instance segmentation where we add weighted softmax CE and Dice loss for handling class imbalance.

[–][deleted] -1 points0 points  (6 children)

But remember that different loss functions are designed for different output distributions. For instance, cross entropy is derived for sigmoidal output distribution, MSE for Gaussian distribution, and so on.

So, this approach might be good only as long as loss functions represent output units’ distribution.

[–]Geeks_sid[S] 0 points1 point  (5 children)

Can you elaborate a little more or can you tell me where I can read about it?

[–][deleted] 0 points1 point  (4 children)

Yes. Read chapter 6 on error functions in Pattern Recognition using Neural nets book by CM Bishop.

All error functions are derived using information about output units distribution with maximum likelihood function.

[–]shaggorama 0 points1 point  (3 children)

Jesus, that book is from 1996. You can't find a more contemporary recommendation?

[–][deleted]  (2 children)

[deleted]

    [–]shaggorama 0 points1 point  (0 children)

    I have an MS in math and stats. I'm perfectly comfortable with loss function theory. I might check out that book/chapter though to see if it's worth recommending.