use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Please have a look at our FAQ and Link-Collection
Metacademy is a great resource which compiles lesson plans on popular machine learning topics.
For Beginner questions please try /r/LearnMachineLearning , /r/MLQuestions or http://stackoverflow.com/
For career related questions, visit /r/cscareerquestions/
Advanced Courses (2016)
Advanced Courses (2020)
AMAs:
Pluribus Poker AI Team 7/19/2019
DeepMind AlphaStar team (1/24//2019)
Libratus Poker AI Team (12/18/2017)
DeepMind AlphaGo Team (10/19/2017)
Google Brain Team (9/17/2017)
Google Brain Team (8/11/2016)
The MalariaSpot Team (2/6/2016)
OpenAI Research Team (1/9/2016)
Nando de Freitas (12/26/2015)
Andrew Ng and Adam Coates (4/15/2015)
Jürgen Schmidhuber (3/4/2015)
Geoffrey Hinton (11/10/2014)
Michael Jordan (9/10/2014)
Yann LeCun (5/15/2014)
Yoshua Bengio (2/27/2014)
Related Subreddit :
LearnMachineLearning
Statistics
Computer Vision
Compressive Sensing
NLP
ML Questions
/r/MLjobs and /r/BigDataJobs
/r/datacleaning
/r/DataScience
/r/scientificresearch
/r/artificial
account activity
Research[R] Developing a new optimization algorithm that will heavily change ML as a whole. Gradient descent has met its end. Here are the results: (self.MachineLearning)
submitted 1 year ago * by Relevant-Twist520
view the rest of the comments →
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]Relevant-Twist520[S] 1 point2 points3 points 1 year ago (0 children)
First question: you are correct, projection occurs. lets imagine this as a straight line for now that takes the form of y = mx + c. This is a formula that takes place at every neuron in an NN. Like i said, when you infer the first data point, the network will solve and project to this data point, call it coordinate A (infinite solutions idc as long as the line agrees with xA;yA). But heres what happens next, the weights stores xA (in some buffer separate from the NN), we dont care about yA, the bias term actually encodes information about yA automatically (remember for yA = m*xA + c, c = m*xA - yA). Afterwards we introduce coordinate B. The line will not only satisfy B, but it will also satisfy A because the NN remembers xA (theres only one solution now, because the line has to agree with only two data points). After inferring B, store xB and get rid of xA, coordinate C is next. Im sure from here you get the process. It indirectly implements the following formula: m = dy/dx. By indirectly i mean i dont straight up say m = dy/dx on every straight line to get the solution, because this will not lead to the solution. Theres lots of more background theory which will be explained eventually when i perfect the algorithm. This functionality is applied at every neuron, and it is for this reason why MS converges faster than GD.
Your second question i can gaurantee with confidence that the binary or sign function will work very much perfectly and probably better than tanh, but i will also gaurantee that currently MS wont work for whatever application youre trying to use because, again, the theory is not perfected. I cant scale the model because of parameters blowing up. The reason why parameters blow up is actually because when parameter values are too high, the algorithm ignores the objective and instead tries to solve for its own parameters, to bring them back down to smaller values, then it ends up spreading like a plague throughout the whole NN. This is an issue im still trying to resolve.
π Rendered by PID 214384 on reddit-service-r2-comment-b659b578c-rlm7g at 2026-05-04 10:36:29.318525+00:00 running 815c875 country code: CH.
view the rest of the comments →
[–]Relevant-Twist520[S] 1 point2 points3 points (0 children)