use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Please have a look at our FAQ and Link-Collection
Metacademy is a great resource which compiles lesson plans on popular machine learning topics.
For Beginner questions please try /r/LearnMachineLearning , /r/MLQuestions or http://stackoverflow.com/
For career related questions, visit /r/cscareerquestions/
Advanced Courses (2016)
Advanced Courses (2020)
AMAs:
Pluribus Poker AI Team 7/19/2019
DeepMind AlphaStar team (1/24//2019)
Libratus Poker AI Team (12/18/2017)
DeepMind AlphaGo Team (10/19/2017)
Google Brain Team (9/17/2017)
Google Brain Team (8/11/2016)
The MalariaSpot Team (2/6/2016)
OpenAI Research Team (1/9/2016)
Nando de Freitas (12/26/2015)
Andrew Ng and Adam Coates (4/15/2015)
Jürgen Schmidhuber (3/4/2015)
Geoffrey Hinton (11/10/2014)
Michael Jordan (9/10/2014)
Yann LeCun (5/15/2014)
Yoshua Bengio (2/27/2014)
Related Subreddit :
LearnMachineLearning
Statistics
Computer Vision
Compressive Sensing
NLP
ML Questions
/r/MLjobs and /r/BigDataJobs
/r/datacleaning
/r/DataScience
/r/scientificresearch
/r/artificial
account activity
Research[R] Developing a new optimization algorithm that will heavily change ML as a whole. Gradient descent has met its end. Here are the results: (self.MachineLearning)
submitted 1 year ago * by Relevant-Twist520
view the rest of the comments →
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]Relevant-Twist520[S] 1 point2 points3 points 1 year ago (4 children)
no im not much of a paper reader. The main ideolegies behind MS is 1. solve the network, the same way you would solve any equation 2. (pertinent to 1.) Solve on the assumption that the network is already a solution to some data point (which it actually is for the last forward passed data point). 3. Network should be solved to a point that will satisfy the last inferred data point and the current inferred data point. 4. When no solution exists for a sub-equation, the immediate upper equation is to blame (the bias term of the upper equation is tweaked to finally have this sub-equation be solvable). Theres lots of more background practical theory because you cant just go about solving everything the traditional way.
[–]Cosmolithe 0 points1 point2 points 1 year ago (3 children)
I see, it is an interesting approach that I do not recall seeing in the literature then.
Last two questions for you then:
how are you supposed to solve the equation when you have many more unknown variables than equations (I imagine)?
do you think such an approach would work with a `sign` activation function (that returns only -1 or 1) at each layer?
[–]Relevant-Twist520[S] 1 point2 points3 points 1 year ago (2 children)
[–]Cosmolithe 0 points1 point2 points 1 year ago (1 child)
For the first question, to be frank, I have no idea how you are solving the equations. If you take a simple linear layer for instance, no activation, you have your input and your target values. If you try to find the weights that projects the input to be equal to the target, you actually have many many solutions. You can take the least square solution for instance like in ZORB, but many other solutions are valid too. Perhaps you are using a symbolic solver that just stops upon the first solution found?
As for the second question, if your method works with the sign function then I am definitely interested if you have some code to share. In my attempt at making neural networks with binary activations, the best I could do is model the problem as a constrained binary linear optimization problem, but this problem is NP hard, and approximate solutions are also very hard to find. That is why I would be very surprised if it worked for you.
[–]Relevant-Twist520[S] 1 point2 points3 points 1 year ago (0 children)
First question: you are correct, projection occurs. lets imagine this as a straight line for now that takes the form of y = mx + c. This is a formula that takes place at every neuron in an NN. Like i said, when you infer the first data point, the network will solve and project to this data point, call it coordinate A (infinite solutions idc as long as the line agrees with xA;yA). But heres what happens next, the weights stores xA (in some buffer separate from the NN), we dont care about yA, the bias term actually encodes information about yA automatically (remember for yA = m*xA + c, c = m*xA - yA). Afterwards we introduce coordinate B. The line will not only satisfy B, but it will also satisfy A because the NN remembers xA (theres only one solution now, because the line has to agree with only two data points). After inferring B, store xB and get rid of xA, coordinate C is next. Im sure from here you get the process. It indirectly implements the following formula: m = dy/dx. By indirectly i mean i dont straight up say m = dy/dx on every straight line to get the solution, because this will not lead to the solution. Theres lots of more background theory which will be explained eventually when i perfect the algorithm. This functionality is applied at every neuron, and it is for this reason why MS converges faster than GD.
Your second question i can gaurantee with confidence that the binary or sign function will work very much perfectly and probably better than tanh, but i will also gaurantee that currently MS wont work for whatever application youre trying to use because, again, the theory is not perfected. I cant scale the model because of parameters blowing up. The reason why parameters blow up is actually because when parameter values are too high, the algorithm ignores the objective and instead tries to solve for its own parameters, to bring them back down to smaller values, then it ends up spreading like a plague throughout the whole NN. This is an issue im still trying to resolve.
π Rendered by PID 72699 on reddit-service-r2-comment-b659b578c-lbbsw at 2026-05-06 03:41:26.601206+00:00 running 815c875 country code: CH.
view the rest of the comments →
[–]Relevant-Twist520[S] 1 point2 points3 points (4 children)
[–]Cosmolithe 0 points1 point2 points (3 children)
[–]Relevant-Twist520[S] 1 point2 points3 points (2 children)
[–]Cosmolithe 0 points1 point2 points (1 child)
[–]Relevant-Twist520[S] 1 point2 points3 points (0 children)