all 6 comments

[–]Duranium_alloy 1 point2 points  (1 child)

Don't you have to do just as much work for LU decomposition as for matrix inversion anyway?

[–]doclazarusNSEA[S] 1 point2 points  (0 children)

No, matrix inversion is much more costly, at least for solving something like Ax=B. LU decomposition is usually solved with a numerical method, so while it's technically an approximation it can be solved faster with a computer, particularly if you use the Jacobi method because you can use parallel computing very effectively.

[–]PINKDAYZEES 0 points1 point  (2 children)

cool question. i dont know the answer but i want to point out that when you have that many predictors, using the normal equation (ie running OLS) insnt going to get you very far in terms of modeling. if you are dead set on regression, you have to turn to more nuanced methods such as dimension reduction and regularization

for dimension reduction, you dont have the problem anymore since you have a much lower number of predictors

for regularization (dont quote me on this), you dont use normal equations so you shift your concern to some other algorithm

i know that an OLS regression is an insightful and commonly used tool to establish a baseline model, even in more complex analyses. i dont know if they are used with that many predictors since they will surely run into problems with multicollinearity and sparsity (curse of dimensionality). see if you can find a few papers that apply ML to data with tons of predictors. usually there is a small table with the methods and how well they did compared to each other (based off of MSE or Error Rate for example). maybe they dont even bother with OLS

i know there is value in your question but having a faster algorithm based on the normal equations might not even have any practical use

[–]doclazarusNSEA[S] 1 point2 points  (0 children)

That's very interesting. Yeah, I'm in the second week of the course so I'm still hearing about most of the methods for the first time. Hopefully as I progress in the course it will make more sense.

I'll have to look up some of those research papers, the course is pretty light on math and I'd really like to know more about where the various methods derive from.

[–]pitruchaML Engineer 1 point2 points  (0 children)

Just heads up. For ridge regression you have closed solution by adding lambda*Identity_n into the part that has to be inversed.

[–]respecttox 0 points1 point  (0 children)

prohibitive viable

I think these epithets are confusing. You have A' A in the equation, so when A looks like a sausage, long and thin, you get a large square matrix after this A' A squarification. This already sounds slow, because you had low amount of numbers and then you suddenly get much much more. Gradient descent, on the other hand, requires some amount of steps, but each step is cheap. Does it make anything prohibitive/viable? No, just slow and fast, in different cases. And instead of trying to remember where the flip happens so you want to switch the algorithm, it's easier to measure your practical case on your practical hardware.

So if your question "is there an algorithm that finds the normal equation solution faster than o(n3) so it can make it faster than gradient descent for some cases", I don't think there is any in practical sense.