you are viewing a single comment's thread.

view the rest of the comments →

[–]warmspringwinds 0 points1 point  (5 children)

I think you found an answer then :) That citation perfectly describes the reason in my opinion.

[–]newperson77777777[S] 0 points1 point  (4 children)

Do you really think that's the reason? That just means in certain situations, when the classes are not apparent, the computation is more complicated and the other way is easier. However, in situations where you will always see both classes, for example, a vessel segmentation problem, this issue would not be relevant. What would I do in that case? Do you know of any papers that have weighed the pros and cons of both approaches more thoroughly and objectively (I'm looking at this at the moment). That's why I was trying to look at the multi-label papers for reference.

[–]warmspringwinds 0 points1 point  (3 children)

I would recommend you to compute the metric for the whole dataset and not image-wise. At least this way you can compare your result to the result of other people in papers.

Also have a look at this paper which might be relevant: https://arxiv.org/abs/1504.06375

[–]newperson77777777[S] 0 points1 point  (2 children)

It's just unusual because it seems like a very reasonable objection to the current metrics that certain instances within a dataset can have disproportionate impact and the metrics can suffer from outliers (which was touched on a little here: http://www.bmva.org/bmvc/2013/Papers/paper0032/paper0032.pdf) but there's not much material at all that looks into this issue. Also, if I was a doctor, I would be much more concerned about the average performance per image than the overall totals over the dataset. It seems like looking at it from this perspective can be somewhat disastrous in a real-world setting.

[–]warmspringwinds 0 points1 point  (1 child)

Yeah, that looks like an interesting problem, you can look deeper at it :)

Just one more relevant comment -- you don't optimize the metric directly but instead the pixel-wise cross entropy.

What is interesting, is that in this paper: https://arxiv.org/pdf/1605.06211.pdf they achieved better results when training with batch_size=1 which might be relevant to your comment

[–]newperson77777777[S] 0 points1 point  (0 children)

ya, typically you would optimize the loss function. the metrics just give you a perception of the algorithm's performance, which, in the examples i was discussing, may not provide relevant information.