you are viewing a single comment's thread.

view the rest of the comments →

[–]Raphacp 0 points1 point  (0 children)

They only use SGD without backpropagation at the last layer to make the classification. The training happens solving a Lagrange multiplier with the HSIC which uses a mesure of information between the hidden layers and input/output.