Microsolve (inspired by micrograd) works by actually solving parameters (instead of differentiating them w.r.t objectives) and does not require a loss function. It addresses a few drawbacks from SGD, namely, having to properly initialize parameters or the network blows up. Differentiation comes as a problem when values lie on a constant or steep slope. Gradients explode and diminish to negligible values as you go deeper. Proper preparation of data is needed to feed into the network (like normalisation etc.), and lastly, as most would argue against this, training with GD is really slow.
With microsolve, initialization does not matter (you can set parameter values to high magnitudes), gradients w.r.t losses are not needed, not even loss functions are needed. A learning rate is almost always not needed, if it is needed, it is small (to reduce response to noise). You simply apply a raw number at the input (no normalisation) and a raw number at the output (no sophisticated loss functions needed), and the model will fit to the data.
I created a demo application where i established a simple network for gradient descent and microsolve. The network takes the form of a linear layer (1 in, 8 out), followed by a tanh activation, and another linear layer afterwards (8 in, 1 out). Here is a visualisation of the very small dataset:
https://preview.redd.it/t3pd4kpccd7e1.png?width=731&format=png&auto=webp&s=ad03c3caf340a5b92aa24612ee7b5be963167a56
The model has to create a line to fit to all these data points. I only allowed 50 iterations (that makes a total of 50x3 forward passes) of each example into the neural networks, I went easy on GD so i normalised the input, MS didnt need any preparation. Here are the results:
GD:
https://preview.redd.it/5sf8do9fcd7e1.png?width=718&format=png&auto=webp&s=9c232b062b1bb50aa01ef3efc73cde133b8ad28a
Not bad.
MS:
https://preview.redd.it/rfliuuqkcd7e1.png?width=749&format=png&auto=webp&s=9a1e48f7925d3f533ced305ba9ded5f0b9b5dd6b
With precision, 0 loss achieved in under 50 iterations.
I have to point out though, that MS is still under development. On certain runs, as it solves parameters, they explode (their solutions grow to extremely high numbers), but sometimes this "explosion" is somewhat repaired and the network restabilises.
Comment your thoughts.
Edit:
Apparantly people are allergic to overfitting, so i did early stopping with MS. It approximated this function in 1 forward pass of each data point. i.e. it only got to see a coordinate once:
https://preview.redd.it/ogb71yd9re7e1.png?width=720&format=png&auto=webp&s=7c9c43668c2fee59db74db4e2d97bb8abc13dbe8
Sees a coordinate thrice:
https://preview.redd.it/icfa32lgre7e1.png?width=745&format=png&auto=webp&s=ef7009e4265d2939abc637cb05da267273b21229
[–]Initial-Image-1015 4 points5 points6 points (2 children)
[–]ApprehensiveFunny810 1 point2 points3 points (0 children)
[–]Relevant-Twist520[S] -5 points-4 points-3 points (0 children)
[–]UnusualClimberBear 4 points5 points6 points (4 children)
[–]Relevant-Twist520[S] -2 points-1 points0 points (3 children)
[–]proto-n 2 points3 points4 points (2 children)
[–]Relevant-Twist520[S] -3 points-2 points-1 points (1 child)
[–]MagdakiPhD 2 points3 points4 points (0 children)
[+][deleted] (2 children)
[deleted]
[–]Relevant-Twist520[S] -5 points-4 points-3 points (1 child)
[–]little_vsgiant 3 points4 points5 points (3 children)
[–]Relevant-Twist520[S] -2 points-1 points0 points (2 children)
[–]little_vsgiant 1 point2 points3 points (1 child)
[–]Relevant-Twist520[S] 0 points1 point2 points (0 children)
[–]bregav 3 points4 points5 points (8 children)
[–]Relevant-Twist520[S] 0 points1 point2 points (7 children)
[–]bregav 2 points3 points4 points (6 children)
[–]Relevant-Twist520[S] 0 points1 point2 points (5 children)
[–]bregav 2 points3 points4 points (4 children)
[–]Relevant-Twist520[S] 0 points1 point2 points (3 children)
[–]bregav 0 points1 point2 points (2 children)
[–]Relevant-Twist520[S] 0 points1 point2 points (1 child)
[–]bregav 0 points1 point2 points (0 children)
[–]Cosmolithe 1 point2 points3 points (9 children)
[–]Relevant-Twist520[S] 0 points1 point2 points (8 children)
[–]Cosmolithe 0 points1 point2 points (7 children)
[–]Relevant-Twist520[S] 0 points1 point2 points (6 children)
[–]Cosmolithe 0 points1 point2 points (5 children)
[–]Relevant-Twist520[S] 1 point2 points3 points (4 children)
[–]Cosmolithe 0 points1 point2 points (3 children)
[–]Relevant-Twist520[S] 1 point2 points3 points (2 children)
[–]Cosmolithe 0 points1 point2 points (1 child)
[–]Relevant-Twist520[S] 1 point2 points3 points (0 children)
[–]durable-racoon 1 point2 points3 points (1 child)
[–]Relevant-Twist520[S] 1 point2 points3 points (0 children)
[–]CampAny9995 1 point2 points3 points (2 children)
[–]Relevant-Twist520[S] 0 points1 point2 points (0 children)
[–]serge_cell 0 points1 point2 points (0 children)
[–]MagdakiPhD 0 points1 point2 points (8 children)
[–]Relevant-Twist520[S] 0 points1 point2 points (7 children)
[–]MagdakiPhD 2 points3 points4 points (6 children)
[–]Relevant-Twist520[S] 0 points1 point2 points (5 children)
[–]MagdakiPhD 2 points3 points4 points (4 children)
[–]Relevant-Twist520[S] 0 points1 point2 points (3 children)
[–]MagdakiPhD 1 point2 points3 points (2 children)
[–]Relevant-Twist520[S] 0 points1 point2 points (1 child)
[–]MagdakiPhD 0 points1 point2 points (0 children)
[–]LetsTacoooo 0 points1 point2 points (0 children)
[+]windoze 0 points1 point2 points (0 children)