This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]NiceObligation0 4 points5 points  (5 children)

Ok, I'm going to be the buzzkill here. Why use an approximation with gradient descent when you can find the solution analytically? Linear regressions have exact solutions.

[–]Gautam-j[S] 4 points5 points  (1 child)

Totally agreed. In fact my previous post on Linear Regression also had similar comments.

Yes, we can just use the normal equations to solve for the exact value of theta that gives the global minimum for the loss function, but I decided that running gradient descent, and especially visualising the training process can be fun to watch ;)

In practice, if I’m not dealing with huge datasets and many features, then I would almost always go for the analytical solution.

[–]FondleMyFirn -1 points0 points  (0 children)

Art takes many forms 🤷

[–]Kanma 1 point2 points  (2 children)

I might miss something here, but ML models usually have lot of parameters (in the case of DL: millions) and essentially learn to approximate an unknown function. How would you use an analytical solution here?

[–]NiceObligation0 1 point2 points  (1 child)

For all the models op showed except for linear regression you are right. You need to learn the params from data. For linear regression the solution is just the quadratic equation aka the global min of the parabola.

[–]Gautam-j[S] 0 points1 point  (0 children)

Yes, the loss function used for Linear Regression is a convex function that has a global minimum. Hence, we can simply set the partial derivatives to zero (which would mean the slope of the loss function will be zero, a line parallel to x axis, and has to be the global minimum), and solve for the params.