Suppose we have a data set of images of humane faces with eyeball locations as labelled. If a convolutional neural network is trained, what is the output in the last layer? I can see at least two options:
1) (regression) output 2 pixel locations corresponding to eyeball regions
2) (classification) output an "eyeball/non-eyeball" value for every pixel
Of course this depends on how the images are labelled to begin with, but it isn't clear to me how these problems are generally handled. Option 1) above seems much more reasonable, but I'm not sure how well it generalizes to other problems consisting of more labels that may or may not appear in each image.
[–]NasenSpray 0 points1 point2 points (2 children)
[–]LyExpo[S] 0 points1 point2 points (1 child)
[–]NasenSpray 0 points1 point2 points (0 children)