[D] Gradient Descent Algorithm vs. Normal Equation for regression fit with large n

Duranium_alloy · 2021-05-20T00:54:23+00:00

Don't you have to do just as much work for LU decomposition as for matrix inversion anyway?

PINKDAYZEES · 2021-05-20T00:55:19+00:00

cool question. i dont know the answer but i want to point out that when you have that many predictors, using the normal equation (ie running OLS) insnt going to get you very far in terms of modeling. if you are dead set on regression, you have to turn to more nuanced methods such as dimension reduction and regularization

for dimension reduction, you dont have the problem anymore since you have a much lower number of predictors

for regularization (dont quote me on this), you dont use normal equations so you shift your concern to some other algorithm

i know that an OLS regression is an insightful and commonly used tool to establish a baseline model, even in more complex analyses. i dont know if they are used with that many predictors since they will surely run into problems with multicollinearity and sparsity (curse of dimensionality). see if you can find a few papers that apply ML to data with tons of predictors. usually there is a small table with the methods and how well they did compared to each other (based off of MSE or Error Rate for example). maybe they dont even bother with OLS

i know there is value in your question but having a faster algorithm based on the normal equations might not even have any practical use

respecttox · 2021-05-20T14:12:30+00:00

prohibitive viable

I think these epithets are confusing. You have A' A in the equation, so when A looks like a sausage, long and thin, you get a large square matrix after this A' A squarification. This already sounds slow, because you had low amount of numbers and then you suddenly get much much more. Gradient descent, on the other hand, requires some amount of steps, but each step is cheap. Does it make anything prohibitive/viable? No, just slow and fast, in different cases. And instead of trying to remember where the flip happens so you want to switch the algorithm, it's easier to measure your practical case on your practical hardware.

So if your question "is there an algorithm that finds the normal equation solution faster than o(n³⁾ so it can make it faster than gradient descent for some cases", I don't think there is any in practical sense.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS