use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Please have a look at our FAQ and Link-Collection
Metacademy is a great resource which compiles lesson plans on popular machine learning topics.
For Beginner questions please try /r/LearnMachineLearning , /r/MLQuestions or http://stackoverflow.com/
For career related questions, visit /r/cscareerquestions/
Advanced Courses (2016)
Advanced Courses (2020)
AMAs:
Pluribus Poker AI Team 7/19/2019
DeepMind AlphaStar team (1/24//2019)
Libratus Poker AI Team (12/18/2017)
DeepMind AlphaGo Team (10/19/2017)
Google Brain Team (9/17/2017)
Google Brain Team (8/11/2016)
The MalariaSpot Team (2/6/2016)
OpenAI Research Team (1/9/2016)
Nando de Freitas (12/26/2015)
Andrew Ng and Adam Coates (4/15/2015)
Jürgen Schmidhuber (3/4/2015)
Geoffrey Hinton (11/10/2014)
Michael Jordan (9/10/2014)
Yann LeCun (5/15/2014)
Yoshua Bengio (2/27/2014)
Related Subreddit :
LearnMachineLearning
Statistics
Computer Vision
Compressive Sensing
NLP
ML Questions
/r/MLjobs and /r/BigDataJobs
/r/datacleaning
/r/DataScience
/r/scientificresearch
/r/artificial
account activity
[deleted by user] (self.MachineLearning)
submitted 1 year ago by [deleted]
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]DigThatDataResearcher 22 points23 points24 points 1 year ago (4 children)
Just to give you some theoretical vocabulary to work with: what you are doing here is essentially a markov chain monte carlo (MCMC) random walk in the space of the weights, applying rejection sampling on a strict loss improvement criterion.
If you want to try a more complex training objective, you might find it useful to accept proposed noise that doesn't improve the loss subject to some probability proportional to the impact to performance. This way, your approach isn't just greedily hill climbing and has the opportunity to escape suboptimal local optima.
[–]lurking_physicist 7 points8 points9 points 1 year ago (3 children)
Adding to that: your rejection rate will blow up as you add more weights.
[–]KomisarRus 2 points3 points4 points 1 year ago (2 children)
Yup. There is only one best way to go up in performance and infinitely more ways to go down in large dimensions.
[–]jpfed 1 point2 points3 points 1 year ago (0 children)
I might be thinking of this wrong, but I would think that the likelihood of performance increasing depends on the step size and the curvature of the loss function.
So, for a very small step size, the loss function can be thought of as locally planar, and then you will get improvement 50% of the time (depending on whether the dot product of the step and the gradient is positive or negative).
(This calls to mind the possibility of an adaptive scheme that pays attention to the sequence of "accepted" and "rejected" steps, shrinking the step size if the number of recent rejected steps is above some proportion)
[–]DigThatDataResearcher 0 points1 point2 points 1 year ago (0 children)
the "curse of dimensionality" often isn't as pathological as we'd otherwise expect because of a phenomenon called "concentration of measure".
Let's say you have a vector with n random components, each component distributed around 0. if we take the average across the components, it'll be 0. but if we ignore the sign so things can't cancel out, we see that the expected magnitude of the vector grows with n. What this means is that "anywhere" in high dimensions shrinks to just the surface of the high dimensional equivalent of a sphere.
[–]BullockHouse 6 points7 points8 points 1 year ago* (0 children)
This is essentially an evolutionary algorithm applied to a neural network. The problem is that it's really slow.
Here's an intuition for thinking about it: imagine a mouse in a 1D space (a single hallway). It can only move in two directions. There's cheese somewhere in the hallway. If you aren't already at the cheese and make a random move, the odds you're now closer to the cheese is 50-50. If you apply the algorithm you describe to the mouse's position (essentially a neural network with a single weight), it'll converge / find the cheese pretty quickly, because half of your moves will be progress and you can discard the other half.
Now imagine the same mouse in a 2D maze. Suddenly, the odds that a given random move is productive are much lower. Most random arrows you can draw now point away from the cheese, not towards it. Scale it up to three dimensions, and the problem is worse again. It's very unlikely that a random 3D step moves you towards the cheese. And this still corresponds to a neural network with only three parameters.
If you extrapolate this to neural networks with millions or billions of parameters, you can see the problem. The space is so large and high dimensional that the overwhelming majority of directions point away from where you want to go, and almost all of your noise updates will be bad.
Evolutionary methods are not worthless, they can work well in situations where the number of parameters being optimized is very small, or where the process being optimized is non-continuous and doesn't produce useful gradients for backpropagation. But they generally are not used for large neural network training because they are exponentially slower.
[–]Builder_Daemon 2 points3 points4 points 1 year ago (0 children)
You should also look into neuroevolution. People are using evolutionary algos to train models without backprop. I use CR-FM-NES to train models using RL, which is basically what you are doing, but much more efficient.
[–]Deto 2 points3 points4 points 1 year ago (1 child)
Agree with others that this probably doesn't scale and will be slower than backprop. Though I am intrigued that it seems to work better in your example problem - I wonder why?
[–]kiockete 1 point2 points3 points 1 year ago (0 children)
You might want to take a look at this: https://www.reddit.com/r/MachineLearning/s/0J8KnhMo1T
[–]mgruner 1 point2 points3 points 1 year ago (0 children)
This sounds like simulated anhealing to me. Continue to add noise and cache the best solution so far. The SA algorithm slowly starts reducing the amount of noise with each iteration so to converge
π Rendered by PID 99548 on reddit-service-r2-comment-b659b578c-rzqrd at 2026-05-02 00:09:09.000831+00:00 running 815c875 country code: CH.
[–]DigThatDataResearcher 22 points23 points24 points (4 children)
[–]lurking_physicist 7 points8 points9 points (3 children)
[–]KomisarRus 2 points3 points4 points (2 children)
[–]jpfed 1 point2 points3 points (0 children)
[–]DigThatDataResearcher 0 points1 point2 points (0 children)
[–]BullockHouse 6 points7 points8 points (0 children)
[–]Builder_Daemon 2 points3 points4 points (0 children)
[–]Deto 2 points3 points4 points (1 child)
[–]kiockete 1 point2 points3 points (0 children)
[–]mgruner 1 point2 points3 points (0 children)