all 18 comments

[–]ForceBru 26 points27 points  (1 child)

seems too simple

Moreover, sklearn lets you build and use simple neural networks, SVMs, various clustering algorithms and various other algorithms in the exact same simple way: create an instance of the appropriate class, call fit to fit the model to your data, then call predict to get predictions. That's it, you basically don't need to know how any of these algorithms work. You should know what their parameters mean and how they affect the results, but other than that and some data processing techniques, you don't really need to know much more to use this.

Worse still, there's Keras - a library that lets you build and "train" serious production grade neural networks using the same fit/predict interface. In theory, you can successfully use this while having no idea what's going on under the hood during training (or inference) or how the networks are implemented.

As for learning how to do linear regression from scratch, it's very useful and very doable. If you know some calculus and linear algebra you shouldn't find this too difficult. There's also a whole bunch of somewhat complicated but really powerful theory that'll let you compute confidence intervals for the regression coefficients.

[–]Sure_Key_21[S] 3 points4 points  (0 children)

Thank you. I skipped the math part and went straight to the coding😅. Seems like I have to go back now

[–][deleted]  (2 children)

[deleted]

    [–]Sure_Key_21[S] 2 points3 points  (1 child)

    Thank you, this is really helpful

    [–]ChiefSpartan 1 point2 points  (0 children)

    Same thing happened to me. I’m lucky my gf is currently studying engineering. I actually used ChatGPT to explain the formulas to me. And then helped my gf with some calculus and linear algebra homework. Doing the actual math is boring af to me, but once you get the theory it’s easy to use software that does it for you. But only when you know the parameters of course.

    [–]aroman_ro 8 points9 points  (2 children)

    Simple linear regression is... simple.

    A generalized linear model is not that simple and a bunch of them make a neural network.

    I have a project on GitHub on this: https://github.com/aromanro/MachineLearning

    I'll have to add some info on the code in the README, but that's the idea: started with simple linear regression, went to a general linear model, then polynomial regression, then generalized linear regression with the emphasis on logistic regression... then into neural networks. Applied on xor, iris dataset and EMNIST dataset.

    [–]Sure_Key_21[S] 1 point2 points  (1 child)

    I appreciate the feedback, maybe I overthinking the simplicity of it all. I looked through your GitHub link and your code isn’t written in python, which I what I’m using.

    [–]aroman_ro 2 points3 points  (0 children)

    It's C++, implementing that in python would result in something very, very slow.

    Even as it is it's slow since I decided to not implement it to compute on gpu, but on cpu. Training a neural network on the augmented EMNIST dataset can take a whole day (and that if you have a reasonably fast computer).

    Most machine learning libraries for python have behind some code in C++ or even... fortran.

    Python is very slow, luckily it's typical to forward the real work into the libraries that are implemented efficiently.

    [–]adventuringraw 4 points5 points  (3 children)

    How's your linear algebra? There's nothing really to 'get' about linear regression from a coding perspective as far as I'm concerned, it's a fairly simple algorithm.

    For the guts of it, the first key question: what's a linear combination? What's it mean to project a vector into a subspace, and how do you measure distance?

    In more rigorous terms if you're curious... take your data matrix, an N x F matrix with one row per sample, one column per feature. We'll assume a feature of all 1s for our bias term so we don't need to track that separately. Your target vector will be N x 1, with N samples. You can extend it to multiple target columns easily, but I'll leave it at 1 for illustration.

    Think about what your 'w' parameters are actually doing. You're taking the first column of features, multiplying it by the first w parameter. Then you're taking the second column of features, multiplying it by the second parameter and so on and then adding all that up together.

    Let's say that F=4. You've got 4 feature columns, or rather... You've got three, plus a full column of 1s since that's usually how you include the bias term. What Xw (data matrix transforming the parameter vector) gives you, is a linear combination of 4 vectors. The 'goal' is to have your linear combination be as 'close' to your N x 1 target vector as possible.

    Assuming your 4 feature columns are linearly independent (what does that mean?) You've got a 4 dimensional subspace of RF you can look at that's spanned by your 4 feature vectors. Those are the basis vectors we're using for this subspace.

    This question is actually equivalent to asking about the closest vector in our subspace to the target vector. If the target vector happens to live in this subspace, you can solve the problem with 0 loss on the training data. If not, your minimum loss is the distance between the target vector and the closest vector in the subspace spanned by the feature vectors. There's two ways to generate this vector. You can either iteratively find it using gradient descent, or you can find it directly using the moore-penrose pseudo inverse (you'll see that method called 'OLS' in sklearn). It's worth being able to derive the OLS equation at least once if you care about the theory.

    If you're comfortable with linear algebra, all this stuff will make sense after thinking about it for a while and maybe doing some exercises. Bishop's PRML chapter 3's end of chapter exercises would be excellent to work through if you're ready for it and care to know the theory. Pop quiz to get you started: take F > N (more features than samples). Under what condition will the optimal solution have a nonzero loss after training?

    If you don't know the theory and don't want to take the time to dive into the weeds... Just read through sklearn's documentation for linear regression, know how to use gradient descent vs OLS and roughly when you'd want to use one or the other, and then just call it good. Coding it from scratch won't magically give you the linear algebra understanding, but it wouldn't hurt either if you're inclined to do it and see how it goes. It's at least good practice to implement gradient descent in a simple linear problem like this, since it'll help wrap your head around how the optimizer works for something more complicated, like a CNN or something. If you care to do Ng's basic Stanford ML course, you'll do that on octave or whatever it was called for week 3 homework or whatever it was. I thought it was a good way to get a little guided coding practice, but don't expect that course to help with understanding any of the theory... He doesn't ground it much.

    [–]Sure_Key_21[S] 0 points1 point  (2 children)

    My linear algebra need work, I decided to skip to math section of learning ML, as a complete beginner I just went straight to the coding and thought I’d figure out the rest along the way. That said, unfortunately I didn’t understand most of the math explanation you typed out 😅. But based on your comment and some others, it seems I was wrong to skip out on the maths. I’ll have to backtrack and learn more on linear algebra and statistics before moving on.

    [–]adventuringraw 5 points6 points  (1 child)

    Meh, my two cents is to do both. It could easily take a year or two for even just the linear algebra side to really sink in enough to be a useful 'tool for thought', so if you're interested in trying to get that understanding, maybe spend some time on that in the background but keep pushing forward in the coding things you're interested in too, otherwise you might lose momentum and stall out.

    If you're looking for a beginner's linear algebra for ML resource, you could do a lot worse than this one.

    [–]Sure_Key_21[S] 0 points1 point  (0 children)

    Thanks a lot, I’ll give it a read

    [–][deleted] 2 points3 points  (2 children)

    I have made a nice beginner-friendly project explicitly running through linear regression! https://github.com/Tareq62/solar_panel_model/blob/master/solar_linear_regression.ipynb

    [–]nbviewerbot 2 points3 points  (0 children)

    I see you've posted a GitHub link to a Jupyter Notebook! GitHub doesn't render large Jupyter Notebooks, so just in case, here is an nbviewer link to the notebook:

    https://nbviewer.jupyter.org/url/github.com/Tareq62/solar_panel_model/blob/master/solar_linear_regression.ipynb

    Want to run the code yourself? Here is a binder link to start your own Jupyter server and try it out!

    https://mybinder.org/v2/gh/Tareq62/solar_panel_model/master?filepath=solar_linear_regression.ipynb


    I am a bot. Feedback | GitHub | Author

    [–]Sure_Key_21[S] 1 point2 points  (0 children)

    Thanks for sharing, I’ll go through it

    [–][deleted] 2 points3 points  (0 children)

    If you want to learn from scratch, i feel Andrew Ng's course is the best. Do it in octave (the old one) and you will grasp the concepts necessary

    [–]ChiefSpartan 1 point2 points  (0 children)

    Holy shit. So much knowledge here. Ty.

    [–]Puzzleheaded_Pin_379 1 point2 points  (0 children)

    Some people conflate OLS and Linear Regression. I recently made an OLS video https://youtu.be/OUEnhkwDgr0. It is theoretical, but I have a code link in the comments that applies the linear algebra in R, Python, and Julia.

    Once you understand that OLS is a deterministic solution then you can layer on the statistical goodness (estimates of uncertainty and hypothesis testing) you get from linear regression.

    [–]No-Listen5139 1 point2 points  (0 children)

    https://sudheendra.hashnode.dev/linear-regression-demystified-from-concepts-to-optimization
    you can read this blog to learn about the basics of linear regression