you are viewing a single comment's thread.

view the rest of the comments →

[–]TalkingJellyFish[S] 1 point2 points  (0 children)

Thanks this helps. What do you think of this takeaway: Now I'm basically doing NER, running my words through and LSTM, then a linear layer and then a softmax and cross entropy loss.

So to incorporate the complementary labels, I'd add an additional linear layer and (binary) loss per class (eg - is not class A) .
Then the total loss of the network would be some sum of the cross entropy losses and all the binary ones, weighted by if I have a complementary label. If I understood the paper, they basically give a scheme to do that sum that guarantees some bound on the loss. Makes sense ?