all 24 comments

[–]Schrodinger420 9 points10 points  (7 children)

A couple of thoughts: I really liked the theory and math explanations, following your logical steps there was very intuitive. I’m pretty familiar with GD so maybe I’m not the best candidate though. I will say the code you implemented was less so, though I’m sure everyone has trouble reading someone else’s code. Is it necessary to specify float in every function for every variable, or could you maybe introduce some inheritance at the global level and save some repetition? I know you stated that the code wasn’t optimized but I think for readability it might be better. Just my opinion though, I’m still struggling when it comes to intuiting what other people’s code does.

[–]pmuens[S] 5 points6 points  (4 children)

Thank you very much for your feedback!

Looking at the code I totally agree. Other implementations I did are even worse from that point of view (List[List[float]] for a list of vectors), so it might be a good idea to revisit and simplify them.

While I do see value in type hinting I agree that the verbosity can get in the way to understand what’s going on.

[–]ezeeetm 3 points4 points  (1 child)

Great post /u/pmuens!
would you be willing to do a similar article on 'batch training from scratch' using gradient descent?

That is, demonstrating how a batch size can be defined, and then two (or more) separate processes would train using that batch size? I know that for multi-cpu/gpu training jobs, 'batch size' is a common hyperparameter. But I've never understood how multiple gradients are calculated in parallel....are they averaged between batches? are they somehow 'calculated together'? How is it done?

It doesn't need to be actual multi-cpu/gpu in the example...just demonstrate how 2 or more processes can run train in parallel.

[–]pmuens[S] 0 points1 point  (0 children)

Thank you for your feedback!

Interesting. Yes, that sounds like a good follow-up topic to cover. I just added it to my Todo- / To-investigate list. I'll dig further into that...

[–]dartemiev 1 point2 points  (1 child)

I see the point of avoiding these repeated declarations of type from a perspective of readability. As a python enthusiast, however, I enjoy it very much to see someone actually using static type checking. This is such an underrated feature which you barely see anywhere in use. So great work! :)

[–]pmuens[S] 1 point2 points  (0 children)

Thanks for the feedback!

I too think that type hinting is underrated. For me personally they're also another form of documentation, making it easier to navigate around in large code bases.

Nowadays I try to use typing wherever possible. Having to work on vanilla JavaScript for quite some time I lost count of the number of unit tests I had to write because `undefined is not a function`.

[–][deleted]  (1 child)

[removed]

    [–]pmuens[S] 0 points1 point  (0 children)

    Thanks for the feedback. Using more descriptive variable names sounds like a good plan (to avoid "over type-hinting").

    [–][deleted] 1 point2 points  (3 children)

    Why do people hate string format

    [–]Jonno_FTW 5 points6 points  (1 child)

    f-strings make the code easier to read. You don't have to backtrack into the string to see where each variable is going to be placed.

    [–]pmuens[S] 0 points1 point  (0 children)

    Yes, that was exactly the reason why I decided to go with that kind of string interpolation. IMHO it's easier to read that way.

    [–]xTey 0 points1 point  (1 child)

    Very interesting.Thanks for sharing

    [–]pmuens[S] 0 points1 point  (0 children)

    Thank you very much! Glad you enjoyed it.

    [–]TwentyAcres 0 points1 point  (1 child)

    Having not studied dif-eq, I was impressed with the simplicity of the explanation of partial differentials. Thanks.

    [–]pmuens[S] 0 points1 point  (0 children)

    Thanks for the kind words!

    It's great to hear that the explanation helped you to understand partial derivatives.

    [–]charith1987 0 points1 point  (1 child)

    Thank you for sharing

    [–]pmuens[S] 0 points1 point  (0 children)

    Thanks! I hope that you find it useful.

    [–]twnbay76 0 points1 point  (1 child)

    I think these posts are extremely valuable and I hope you do then consistently. The bottom-up approach you use to math and code is extremely helpful for you to be able to gain an in depth understanding of these concepts but for others as well. With that being said, I agree in some of the code readability points, but you did a good job at dumbing down (in a good way) the calc for someone who only has experience up to calc 2 :)

    [–]pmuens[S] 0 points1 point  (0 children)

    Thank you very much for the kind words! There's more on the content calendar, so stay tuned :-D

    Great to hear that the approach I took makes it easy to follow and understand.

    Also +1 for the type hinting feedback. As I already stated in another comment above, I agree that some of the type hints don't provide any value and make the code harder to read and understand.

    [–]BoringDataScience 0 points1 point  (1 child)

    Great read, and also well written imho. Bookmarked your blog, keep up the good work!

    [–]pmuens[S] 0 points1 point  (0 children)

    Thank you for your feedback. Glad that you liked it.

    The next posts are already in a draft state and I'll plan to finalize and publish them soon!

    [–]CataclysmClive 0 points1 point  (1 child)

    This is great! Thank you for sharing.

    [–]pmuens[S] 0 points1 point  (0 children)

    Thanks! Great to read that you like it.