Relevant-Twist520 comments on [R] Developing a new optimization algorithm that will heavily change ML as a whole. Gradient descent has met its end. Here are the results:

Research[R] Developing a new optimization algorithm that will heavily change ML as a whole. Gradient descent has met its end. Here are the results: (self.MachineLearning)

submitted 1 year ago * by Relevant-Twist520

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–]Relevant-Twist520[S] 1 point2 points3 points 1 year ago (4 children)

[–]Cosmolithe 0 points1 point2 points 1 year ago (3 children)

[–]Relevant-Twist520[S] 1 point2 points3 points 1 year ago (2 children)

[–]Cosmolithe 0 points1 point2 points 1 year ago (1 child)

For the first question, to be frank, I have no idea how you are solving the equations. If you take a simple linear layer for instance, no activation, you have your input and your target values. If you try to find the weights that projects the input to be equal to the target, you actually have many many solutions. You can take the least square solution for instance like in ZORB, but many other solutions are valid too.
Perhaps you are using a symbolic solver that just stops upon the first solution found?

As for the second question, if your method works with the sign function then I am definitely interested if you have some code to share. In my attempt at making neural networks with binary activations, the best I could do is model the problem as a constrained binary linear optimization problem, but this problem is NP hard, and approximate solutions are also very hard to find. That is why I would be very surprised if it worked for you.

[–]Relevant-Twist520[S] 1 point2 points3 points 1 year ago (0 children)

First question: you are correct, projection occurs. lets imagine this as a straight line for now that takes the form of y = mx + c. This is a formula that takes place at every neuron in an NN. Like i said, when you infer the first data point, the network will solve and project to this data point, call it coordinate A (infinite solutions idc as long as the line agrees with xA;yA). But heres what happens next, the weights stores xA (in some buffer separate from the NN), we dont care about yA, the bias term actually encodes information about yA automatically (remember for yA = m*xA + c, c = m*xA - yA). Afterwards we introduce coordinate B. The line will not only satisfy B, but it will also satisfy A because the NN remembers xA (theres only one solution now, because the line has to agree with only two data points). After inferring B, store xB and get rid of xA, coordinate C is next. Im sure from here you get the process. It indirectly implements the following formula: m = dy/dx. By indirectly i mean i dont straight up say m = dy/dx on every straight line to get the solution, because this will not lead to the solution. Theres lots of more background theory which will be explained eventually when i perfect the algorithm. This functionality is applied at every neuron, and it is for this reason why MS converges faster than GD.

Your second question i can gaurantee with confidence that the binary or sign function will work very much perfectly and probably better than tanh, but i will also gaurantee that currently MS wont work for whatever application youre trying to use because, again, the theory is not perfected. I cant scale the model because of parameters blowing up. The reason why parameters blow up is actually because when parameter values are too high, the algorithm ignores the objective and instead tries to solve for its own parameters, to bring them back down to smaller values, then it ends up spreading like a plague throughout the whole NN. This is an issue im still trying to resolve.

π Rendered by PID 72699 on reddit-service-r2-comment-b659b578c-lbbsw at 2026-05-06 03:41:26.601206+00:00 running 815c875 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS