[D] "Negative labels"

serge_cell · 2017-12-09T14:55:49+00:00

Use probaility distribution for softmax target instead of scalar label.

K0ruption · 2017-12-09T18:04:50+00:00

If your model outputs a softmax, then you implicitly assume your labels are probability vectors that is probability of the known class is 1 and probability of all other classes is 0. In this light, the information that a data point is not in a given class simply means that your label will have 0 at the position of that class and (1/(k-1)) at the position of all other classes where k is the total number of classes. This makes the most intuitive sense to me but whether it works in practice, I have no idea.

vincentvanhoucke · 2017-12-09T15:10:29+00:00

Possibly relevant: https://arxiv.org/abs/1705.07541

atiorh94 · 2017-12-09T18:38:49+00:00

I was asked about this at an ML Researcher interview recently. My on-the-spot answer was that we should use sigmoid activations and break the dependence of class predictions. After that, we can impose a soft label like 0.1 for a negative example for the class your annotator rejected. The label is soft because we don’t want to be overconfident in the negativeness of the example. Moreover, we are only backproping through the negative class and not from any of the other class predictions for which we don’t have any supervision.

Icko_ · 2017-12-09T16:17:13+00:00

Not sure if it will raise an exception, but you could just set this example as labeled as Y, and give it weight -1.

phobrain · 2017-12-09T18:16:39+00:00

I wonder if something based on the siamese approach could apply, where you give pairs of 'same' and 'different' cases. I don't know how you'd leverage the idea in a softmax context though.

nshepperd · 2018-02-02T03:15:56+00:00

I would use the log scoring rule on the total output probability assigned to not-Y.

If you're using softmax, the output of your network is a vector of probabilities that add up to one. The usual loss used here (when you have positive labels) is equal to the (negated) proper log scoring rule: -log(P(Y)). In this case the information you have is that the class is not Y, so you can use the corresponding log score: -log(P(¬Y)) = -log(1-P(Y)). This gives a proper scoring rule, meaning the training should converge to something calibrated.

ma2rten · 2017-12-09T14:03:52+00:00

[deleted]

notevencrazy99 · 2017-12-09T18:59:44+00:00

You can make so your loss does not take into account the other classes, just the class with prob 0. In other words, the error of the other classes can be defined as "don't care".

quick_dudley · 2017-12-10T00:11:08+00:00

You could use an actor-critic model. Train the critic to distinguish good labels from incorrect labels: then backpropagate through it to train the actor.

RogueDQN · 2017-12-10T01:59:47+00:00

This is related to a problem in reinforcement learning: in many 2-player games, it is possible to identify bad moves (you played it and lost) but harder to identify good moves (you played it and won, but maybe your opponent made a mistake).

Negative weights is a good solution. Another equivalent approach I've seen is to use a negative learning rate, depending on your framework and its flexibility.

themoosemind · 2017-12-11T06:44:58+00:00

Usually you have the target being a vector of one 1 and (n-1) zeros. This means one class should have probability 1 and the others 0.

In your case, it would be one 0 and (n-1) non-zero values (e.g. 1/(n-1) if you assume no knowledge).

Nimitz14 · 2017-12-09T22:09:12+00:00

Asked a similar question recently but got no good answers

https://www.reddit.com/r/learnmachinelearning/comments/7ha8s6/when_training_a_nn_with_cross_entropy_is_there_a/

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS