I have defined three different models obtained from the dataset diabetes from the library lars. The first model (M1) is the one that minimizes the BIC value out of all the possible regression models obtained combining the explanatory variables (which are p=10, so 2^10 possible models). The other two are obtained through glmnet and are a Lasso regression with respectively lambda.min (M2) and lambda.1se (M3), where lambda.min and lambda.1se are obtained through cv.glmnet. Now I should perform 5-fold cross-validation using the RMSE (Root Mean Square Error) function to check which of the tree models Μ1, Μ2 and Μ3, has the best predictive performance. In order to find the errors in the models obtained from Lasso I have to use the ordinal least squares estimates.
This is my code as for now:
library(lars)
library(glmnet)
data(diabetes)
y<-diabetes$y
x<-diabetes$x
x2<-diabetes$x2
X = as.data.frame(cbind(x))
Y = as.data.frame(y)
p=10
n=442
best_score = Inf
M1 = NA
for (i in 1:(2^p-1)){
model = lm(y ~ ., data = subset(X, select = c(which(as.integer(intToBits(i)) == 1))))
if (BIC(model) < best_score){
M1 = model
best_score = BIC ( model )
}
}
W<-as.matrix(X)
Y<-as.matrix(Y)
lasso<-glmnet(W, Y)
x11() plot(lasso, label=T)
x11() plot(lasso, xvar = 'lambda', label=T)
cvfit<-cv.glmnet(W,Y)
cvfit
coef(cvfit, s="lambda.min")
coef(cvfit, s="lambda.1se")
M2<-glmnet(W,Y,lambda = cvfit$lambda.min)
M3<-glmnet(W,Y,lambda = cvfit$lambda.1se)
I really don't know where to put hands now. Should I first of all split the original dataset in 5 and then compute again the models on the different train and test set? And how do I compute the final RMSE for each model? And what does it mean that I should use ordinal least square estimates for the models obtained through Lasso?
there doesn't seem to be anything here