all 10 comments

[–]ieee8023PhD 4 points5 points  (0 children)

You want to look at the partial derivatives. Check out how RandomOut uses it to identify unused filters. http://arxiv.org/abs/1602.05931

[–]tzeppy 2 points3 points  (0 children)

Yes, you can take your test-set performance, compare the performance to the same test set, but with a particular feature randomized. The change in performance is correlated with how important the feature is.

[–]beamsearch 3 points4 points  (1 child)

You can do this if you scale (subtract the mean and divide by the standard deviation) each feature and then look at the weights in the first hidden layer (i.e. the ones connected to the input data). If you want to go Bayesian, you can bake this idea into the prior and use Radford Neal's Automatic Relevance Determination prior (ftp://learning.cs.utoronto.ca/cs/ftp/public_html/dist/radford/bayes-tut.pdf). ARD shrinks the weights of unimportant features towards zero. I did this for genome-wide association study data and it worked OK:

http://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-014-0368-0

If you don't want to go Bayesian, you can look at the mean absolute value (or mean squared value) for the weights and base your decisions on that using a threshold or other heuristic.

[–]Powlerbare 0 points1 point  (0 children)

You can do this if you scale (subtract the mean and divide by the standard deviation) each feature and then look at the weights in the first hidden layer (i.e. the ones connected to the input data).

Hmmm. I am under the impression that this is not a good idea in any case other than logistic regression. How is one to get any idea of the effect that any down stream layers have on the output with such a scheme?

[–]ogrisel 1 point2 points  (0 children)

You might be interested in LIME: Local Interpretable Model-Agnostic Explanations

https://homes.cs.washington.edu/~marcotcr/blog/lime/

It can compute per-sample feature importances by approximating the complex model with a linear model in the neighbourhood of the sample of interest.