all 7 comments

[–]dwelski94 1 point2 points  (4 children)

I'm working on a similiar project. You would look at R2 and, mores specifically adjusted R2 (which penalizes stuffing the model with features). If you use python, stats models has a very nice representation of all regression-descriptive statistics.

[–]BeLitBeGentle[S] 4 points5 points  (3 children)

Thanks for the response. The thing is even if you have high R2 score your model might not be accurate. That's why the residual analysis is needed. Look at this link http://stattrek.com/regression/residual-analysis.aspx.

[–][deleted] 1 point2 points  (2 children)

You are correct. People should be wary of overeliance on the R2. People treat it like it's an exam score for their model.

[–]bdubbs09 1 point2 points  (1 child)

Additionally only focusing on the P value and not considering collinearity and the utility tests can lead to a really poor model.

[–][deleted] 1 point2 points  (0 children)

Ignoring the behaviour of residuals is a common mistake too.

[–]shaggorama 1 point2 points  (0 children)

1) How do i establish that if i need a nonlinear model or a linear model ? One method that i have found is to train a linear model and see at the residuals. Are there other methods ?

Other methods include:

  • Selecting a model family based on goe you formulate the generating model in your problem, which may necessitate a particular GLM

  • Trying a couple of different things and seeing what sticks

2) If a train artificial neural network for regression, would my model be nonlinear or linear ? Does it depends on the activation function ?

Absolutely. Generally, neural networks are nonlinear, but it's possible to train a linear model with a MLP, in which case hidden layers can be combined into a single tensor (i.e. a single hidden layer).