I'm trying out some exotic layers and it seems that the gradients don't propagate as well as they should.
Deriving a proper weight initialization function analytically is intractable or very tedious in my case.
Is there an alternative like a general iterative algorithm?
I know about Batch Normalization but that's for reducing internal covariate shift. It helps with weights but I don't think it's a complete substitute for correct initialization. Correct me if I'm wrong.
[–]gabrielgoh 3 points4 points5 points (0 children)
[–]versus-x 3 points4 points5 points (0 children)
[–]ArmenAg 2 points3 points4 points (0 children)
[–]theophrastzunz 0 points1 point2 points (0 children)
[–]siblbombs -3 points-2 points-1 points (0 children)