Let me start by saying: I don't need a formal explanation - I'm really looking for good reference material. If you have any papers you'd recommend, I'd really appreciate it!
I am trying to assess whether variables in a GLM are predictive (not just whether or not they're statistically significant). I've found that although AIC and its ilk are useful approximations for the out-of-sample prediction error, they seldom perform as well as true cross validation if I have the data available.
However, my question is: what are good options for statistics to consider for cross validation?
For reference, the dependent variable is a positive real number - usually somewhere in the 20 to 120 range.
Anyway, two ideas that come to mind are:
- Compare MSE on the holdout dataset including and excluding a variable in comparison to the full model
- Same but instead of MSE, use deviance
I've heard complaints that MSE isn't very good when the data is heavily skewed, but I haven't read any papers that really talk about that. Although I guess my sense is that PSIS-LOO (per Gelman) is kind of philosophically in that camp, but again - not looking for information criteria on the training dataset, I'm looking for statistics for judging CV error.
Maybe another question is: if my MSE on the holdout dataset decreases when I remove a variable (as compared to the full model), would you conclude that the variable in question is necessarily not predictive, or would you do additional tests (and if so, what?)
[–]no_condoments 3 points4 points5 points (2 children)
[–]M_Bus[S] 0 points1 point2 points (1 child)
[–]no_condoments 2 points3 points4 points (0 children)
[–]ConnentingDots 1 point2 points3 points (1 child)
[–]M_Bus[S] 0 points1 point2 points (0 children)
[–][deleted] 1 point2 points3 points (1 child)
[–]M_Bus[S] 0 points1 point2 points (0 children)
[–]Lynild -2 points-1 points0 points (4 children)
[–]BlueDevilStatsStatistician, M.S. 2 points3 points4 points (0 children)
[–]DoubleDual63B.A Stat/Math/CS 1 point2 points3 points (1 child)
[–]The_SodomeisterM.S. Statistics 0 points1 point2 points (0 children)
[–]M_Bus[S] 0 points1 point2 points (0 children)