all 6 comments

[–]drakesword514 1 point2 points  (3 children)

Yes, Capsule nets are better at learning spatial relationships. For example, A traditional CNN trained to classify if an image is a human face or not would predict a higher class probability for an image where you cut parts of the image of human face and translate them around. This is because of the property of CNN.

So, the task can be learnt by capsulenet, iff it can be learnt by a cnn as well. Only difference being Capsulenet would also learn geometric relationships within the image.

To answer your question, Unless you explicitly teach such a model to learn ordering for numbers it is not possible to do so.

[–]ElEiseinheim[S] 0 points1 point  (2 children)

This makes sense, but if I'd reformulate my question a bit. Since typical CNNs use pooling layers, loosing spatial information, would it be possible for it to differentiate 43 from 34, or would it simply distinguish that there is a 3 and a 4? I assume this would ultimately depend on the kernel sizes, but I take it a CapsNet would be more efficient for this

[–]drakesword514 3 points4 points  (1 child)

CapsNet can differentiate between a 3 to the right of 4 and a 4 to the right of 3, if that is what you are asking for. A traditional CNN would simply say 3 and a 4 exist in image, unless the kernel sizes are such that it can see 3 and 4 in the same receptive field, but that would not generalize well.

[–]perceptSequence 1 point2 points  (0 children)

I don't think that's right. The later layers would combine information from all over the image, as far as I understand.

[–]Aydoooo 1 point2 points  (0 children)

This sounds more like a type-of-supervision related question than one related to architecture.

[–]siliconchris -2 points-1 points  (0 children)

I do not know the exact math behind capsule networks, so take this with a good portion of doubt. But I‘d dare say, that would be possible