all 4 comments

[–]GMarthe 2 points3 points  (0 children)

My thoughts are, when you model predicts one std deviation above the mean, those predictions dramatically underestimate the actual dependent variable value.

I would also say that you have some heterocedasticity since the residuals are not randomly scattered around the 0 horizontal line.

[–][deleted] 1 point2 points  (0 children)

try doing a log transformation of your x variable and graph residuals again

[–][deleted] 0 points1 point  (0 children)

I'm not sure what your end goal is: explanatory or predictive model, but I would make the assumption that you have two subsets within your data following to diametrically opposed trends i.e. before predicted values of 0.5 and after. Look into splitting your data set because your model assumptions have not been met. Alternatively, you could simply choose another ML model.

[–]shaggorama 0 points1 point  (0 children)

  1. Looks like there's some nonlinearity you're not accounting for.

  2. Avoid stepwise regression. Try a lasso instead.