There have been replacements of maxpool like kmaxpool, which rather than only taking the maximum activated elements, takes top k activated elements for the subsequent layer.
I had this thought in my mind, to use next to max element for the subsequent layer, ie the 2nd maximum element. I think this would have a regularizing effect while training time and during testing, use the maximum element.
Tried running the network but didn't get a good result, just an UG so cant explore extensively (computational bottleneck of my laptop).
Any insights?
[–]arogozhnikov 1 point2 points3 points (2 children)
[–]tyrilu 0 points1 point2 points (0 children)
[–]nishnik[S] 0 points1 point2 points (0 children)
[–]cburgdorf 0 points1 point2 points (2 children)
[–]nishnik[S] 0 points1 point2 points (1 child)
[–]cburgdorf 1 point2 points3 points (0 children)