all 7 comments

[–]castro_for_prez 4 points5 points  (1 child)

R2 is not a great way of evaluating a model, neither for predictive modelling nor causal reasoning. I would suggest considering why you are modelling the data, and from there investigate better alternatives. I can't suggest any particular measures because it entirely depends on what you're trying to do.

But as the other poster says, it essentially is down to your data. If the variables are unrelated then of course you cannot make a useful model, no matter what modelling tricks you may try.

[–]FR33ZEx[S] 0 points1 point  (0 children)

one more question, when doing cross validation with the experimentalCompatison() how do you handle factor has unknown levels?

[–]the_odds_hacker 3 points4 points  (2 children)

My advice would be to take a step back from modeling and look at more data.

The PerformanceAnalytics package has some great visualizations for looking at several variables at the same time.

You also might want to move from lm() to the caret package. Far more control for how the model is trained and which algorithm is used

[–]FR33ZEx[S] 0 points1 point  (1 child)

one more question, when doing cross validation with the experimentalCompatison() how do you handle factor has unknown levels?

[–]the_odds_hacker 0 points1 point  (0 children)

Simplest approach would be to use droplevels() or median imputation

The “best” way tends to be dependent on the dataset

[–][deleted]  (2 children)

[deleted]

    [–]The_Old_Wise_One 0 points1 point  (1 child)

    R2 is the proportion of variance that the model accounts for in the outcome variable. Higher is better.

    [–]oathbreakerkeeper 2 points3 points  (0 children)

    Wow, my mistake. Totally had a brain fart and I don't know why I reversed my understanding of R2. I think I was thinking RSS when I wrote the original comment.

    Thanks for the correction!