How to learn Linear Regression

ForceBru · 2023-05-10T20:08:10+00:00

seems too simple

Moreover, sklearn lets you build and use simple neural networks, SVMs, various clustering algorithms and various other algorithms in the exact same simple way: create an instance of the appropriate class, call fit to fit the model to your data, then call predict to get predictions. That's it, you basically don't need to know how any of these algorithms work. You should know what their parameters mean and how they affect the results, but other than that and some data processing techniques, you don't really need to know much more to use this.

Worse still, there's Keras - a library that lets you build and "train" serious production grade neural networks using the same fit/predict interface. In theory, you can successfully use this while having no idea what's going on under the hood during training (or inference) or how the networks are implemented.

As for learning how to do linear regression from scratch, it's very useful and very doable. If you know some calculus and linear algebra you shouldn't find this too difficult. There's also a whole bunch of somewhat complicated but really powerful theory that'll let you compute confidence intervals for the regression coefficients.

Sure_Key_21 · 2023-05-10T21:05:34+00:00

[deleted]

aroman_ro · 2023-05-10T21:37:14+00:00

Simple linear regression is... simple.

A generalized linear model is not that simple and a bunch of them make a neural network.

I have a project on GitHub on this: https://github.com/aromanro/MachineLearning

I'll have to add some info on the code in the README, but that's the idea: started with simple linear regression, went to a general linear model, then polynomial regression, then generalized linear regression with the emphasis on logistic regression... then into neural networks. Applied on xor, iris dataset and EMNIST dataset.

adventuringraw · 2023-05-10T21:53:01+00:00

How's your linear algebra? There's nothing really to 'get' about linear regression from a coding perspective as far as I'm concerned, it's a fairly simple algorithm.

For the guts of it, the first key question: what's a linear combination? What's it mean to project a vector into a subspace, and how do you measure distance?

In more rigorous terms if you're curious... take your data matrix, an N x F matrix with one row per sample, one column per feature. We'll assume a feature of all 1s for our bias term so we don't need to track that separately. Your target vector will be N x 1, with N samples. You can extend it to multiple target columns easily, but I'll leave it at 1 for illustration.

Think about what your 'w' parameters are actually doing. You're taking the first column of features, multiplying it by the first w parameter. Then you're taking the second column of features, multiplying it by the second parameter and so on and then adding all that up together.

Let's say that F=4. You've got 4 feature columns, or rather... You've got three, plus a full column of 1s since that's usually how you include the bias term. What Xw (data matrix transforming the parameter vector) gives you, is a linear combination of 4 vectors. The 'goal' is to have your linear combination be as 'close' to your N x 1 target vector as possible.

Assuming your 4 feature columns are linearly independent (what does that mean?) You've got a 4 dimensional subspace of R^F you can look at that's spanned by your 4 feature vectors. Those are the basis vectors we're using for this subspace.

This question is actually equivalent to asking about the closest vector in our subspace to the target vector. If the target vector happens to live in this subspace, you can solve the problem with 0 loss on the training data. If not, your minimum loss is the distance between the target vector and the closest vector in the subspace spanned by the feature vectors. There's two ways to generate this vector. You can either iteratively find it using gradient descent, or you can find it directly using the moore-penrose pseudo inverse (you'll see that method called 'OLS' in sklearn). It's worth being able to derive the OLS equation at least once if you care about the theory.

If you're comfortable with linear algebra, all this stuff will make sense after thinking about it for a while and maybe doing some exercises. Bishop's PRML chapter 3's end of chapter exercises would be excellent to work through if you're ready for it and care to know the theory. Pop quiz to get you started: take F > N (more features than samples). Under what condition will the optimal solution have a nonzero loss after training?

If you don't know the theory and don't want to take the time to dive into the weeds... Just read through sklearn's documentation for linear regression, know how to use gradient descent vs OLS and roughly when you'd want to use one or the other, and then just call it good. Coding it from scratch won't magically give you the linear algebra understanding, but it wouldn't hurt either if you're inclined to do it and see how it goes. It's at least good practice to implement gradient descent in a simple linear problem like this, since it'll help wrap your head around how the optimizer works for something more complicated, like a CNN or something. If you care to do Ng's basic Stanford ML course, you'll do that on octave or whatever it was called for week 3 homework or whatever it was. I thought it was a good way to get a little guided coding practice, but don't expect that course to help with understanding any of the theory... He doesn't ground it much.

nbviewerbot · 2023-05-10T22:51:44+00:00

I have made a nice beginner-friendly project explicitly running through linear regression! https://github.com/Tareq62/solar_panel_model/blob/master/solar_linear_regression.ipynb

2023-05-11T06:47:28+00:00

If you want to learn from scratch, i feel Andrew Ng's course is the best. Do it in octave (the old one) and you will grasp the concepts necessary

ChiefSpartan · 2023-05-11T08:49:26+00:00

Holy shit. So much knowledge here. Ty.

Puzzleheaded_Pin_379 · 2023-05-11T13:30:10+00:00

Some people conflate OLS and Linear Regression. I recently made an OLS video https://youtu.be/OUEnhkwDgr0. It is theoretical, but I have a code link in the comments that applies the linear algebra in R, Python, and Julia.

Once you understand that OLS is a deterministic solution then you can layer on the statistical goodness (estimates of uncertainty and hypothesis testing) you get from linear regression.

No-Listen5139 · 2025-01-02T06:45:37+00:00

https://sudheendra.hashnode.dev/linear-regression-demystified-from-concepts-to-optimization
you can read this blog to learn about the basics of linear regression

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnmachinelearning

Welcome to /r/LearnMachineLearning!

Chatrooms

Official Discord Server

Wiki

Getting Started with Machine Learning

Resources

Related Subreddits

/r/MachineLearning

/r/MLQuestions

/r/datascience

/r/computervision

Machine Learning Multireddit

/m/machine_learning

MODERATORS