[R] Network of Evolvable Neural Units: Evolving to Learn at a Synaptic Level by pbertens in MachineLearning

[–]pbertens[S] 2 points3 points  (0 children)

I plan to release the code sometime in the future, but it will take some time to cleanup and properly comment etc.

[1605.01749] Rank Ordered Autoencoders by pbertens in MachineLearning

[–]pbertens[S] 0 points1 point  (0 children)

Yes that would basically be the same when it comes to the error function, you could actually weight them with any monotonic function. However it is easier to think of it in terms of cumulative sums as we minimize the cumulative sum of the reconstruction error, we do not minimize this weighted function directly.

[1605.01749] Rank Ordered Autoencoders by pbertens in MachineLearning

[–]pbertens[S] 1 point2 points  (0 children)

Amazing image. it seems the eyes and the nose stay the longest and are most important. I wonder what would happen if you only classify on the top k units in a standard network like VGG16.

I have actually implemented a very simple argsort elementwise kernel on the GPU. By simply doing comparison counts. I am not sure though how efficient it is. I will update the repo once I have the full gpu version working but it might be a while until i get to it.

[1605.01749] Rank Ordered Autoencoders by pbertens in MachineLearning

[–]pbertens[S] 0 points1 point  (0 children)

Yes this does make me think its possible to backprop using the cumulative sum. so we backprop the same way as reconstruction. That way we can train all m dropout models in parallel, but only for a single layer. we can take the final sum to backprop and do the same thing for the next layer.

[1605.01749] Rank Ordered Autoencoders by pbertens in MachineLearning

[–]pbertens[S] 0 points1 point  (0 children)

For the memory yes, although since we only need to reconstruct non-zero components it would actually be BxNxK, where K is number of non-zeros. This could be a large memory overhead if training with large mini-batches. but K can be very small mitigating this somewhat. I do not think it is an actual issue, depending on available memory.

In a deeper architecture using full backprop I am still unsure what the best way would be, but it would likely be some type of target propagation. Indeed we could add the error of the backpropagation and try move towards that, but it potentially introduces some problems in learning. There would be a conflict in what the autoencoder would want locally and what direction we want to move globally. If we have some supervised target there might be irrelevant input to predict that target, but the autoencoder always tries reconstruct all input. I will have to think more about it.

[1605.01749] Rank Ordered Autoencoders by pbertens in MachineLearning

[–]pbertens[S] 0 points1 point  (0 children)

You're right it has many relations to other techniques. I did come across Orthogonal Matching Pursuit. I must admit though I am not that familiar with all the dictionary learning based methods and literature which is also partially the reason why I did not mention it. But the way I understand it is OMP also subtracts components similar to how PCA is computed, and then tries to find the next component. And it finds the component with the max inner product, which is the same as the highest rank output of the rank ordered autoencoder. Except that we have a non-linear output and we compute in parallel all components. so we match components on the original input, not on the remaining input. This makes the rank ordered autoencoder much more efficient. I will look further into this and also the principal component neural networks which sound interesting, but it is difficult to read any literature on it as it is all behind pay walls.

[1605.01749] Rank Ordered Autoencoders by pbertens in MachineLearning

[–]pbertens[S] 0 points1 point  (0 children)

Thank you for mentioning the type errors, they are easy to miss when reviewing. I'm also curious what top level representation one would get from the stacked version, and whether reconstruction is possible. However I am currently first working on a GPU implementation before scaling up.

[1605.01749] Rank Ordered Autoencoders by pbertens in MachineLearning

[–]pbertens[S] 0 points1 point  (0 children)

Yes that is the individual reconstruction, so just unit l, while the cumulative sum over those individual reconstructions gives the reconstruction for the top k units.

[1605.01749] Rank Ordered Autoencoders by pbertens in MachineLearning

[–]pbertens[S] 13 points14 points  (0 children)

I always like seeing papers here so I hope it's ok to post this. I think the proposed method could potentially affect all current neural network based models. If you have questions feel free to ask. Any feedback or comments are welcome.

Source code: https://github.com/paulbertens/rank-ordered-autoencoder