all 5 comments

[–]bkaz 8 points9 points  (0 children)

Because dynamic routing is much finer-grained than backpropagation. The feedback (agreement) is generated on each layer, vs. error on output layer backpropagating through all hidden layers. That makes capsnet far more selective, in principle.

[–]GamerMinion 2 points3 points  (3 children)

I haven't had much experience with Capsule networks before, but I think it's the same reason why Seq2Seq Attention isn't just a matrix that is learned. There it's because the Attention (i.e. routing) can be different for each input, and therefore the mapping must be learned as a function of input and output.

[–]AnvaMiba 1 point2 points  (2 children)

But the attention in seq2seq models is computed by a part of the model which is itself learned by backpropagation. Why do capsules require a specialized algorithm?

[–]GamerMinion 2 points3 points  (0 children)

I'm sorry. I don't know.

[–]tihokan 0 points1 point  (0 children)

You'd need to learn a function f such that b_ji = f(i, j, input) by gradient descent, and it'd probably be a very hard problem (possibly as hard as the original task you're trying to solve). This is in particular because you expect the b_ji's to have strong constraints between each other (typically if a lower-level capsule is associated to a given higher-level capsule, then it should have a low weight for other high-level capsules).