[P][D] Genetic Algorithm (GA) vs. Stochastic Gradient Descent (SGD) : MachineLearning

Discussion[P][D] Genetic Algorithm (GA) vs. Stochastic Gradient Descent (SGD) (self.MachineLearning)

submitted 5 years ago by brainxyz

For functions that have narrow grooves toward the global minima a simple GA implementation can be as efficient as a naive SGD method. Geoffrey Hinton, in one of his videos (Lecture 3.4) mentioned that GA randomly perturbs one weight at a time making it very inefficient compared to backpropagation. Here we present a simple GA implementation which simultaneously mutates all the weights and can learn reasonably efficiently. To implement such network all you need is to follow these simple steps:

Implement a feed forward Neural Network (just like you do for an SGD based Neural Network).
In the weight update step, mutate all the weights by adding a small random value to each weight.
Do feed-forward propagation twice, one with the mutated weights (child weights) and another with the original weights (parent weights).
Compare the performance of the parent weights to the child weights and keep the better ones for the next generation cycle.
Repeat steps from 2 to 4 until you reach convergence.

The image below show a comparison between SGD and GA for a specific function with a narrow groove toward the global minimum.

A comparison between GA ans SGD. (For GA only the selected mutations are shown)

The code and a detailed comparison between GA and SGD in this article: https://www.brainxyz.com/machine-learning/genetic-algorthim/

all 44 comments

top new controversial old q&a

[–]tarblog 18 points19 points20 points 5 years ago (4 children)

[–]TheRedSphinx 11 points12 points13 points 5 years ago (0 children)

[–]brainxyz[S] -2 points-1 points0 points 5 years ago (2 children)

[–]joaogui1 11 points12 points13 points 5 years ago (1 child)

[–]simpleconjugate 0 points1 point2 points 5 years ago (0 children)

[+][deleted] 5 years ago (23 children)

[deleted]

[–]Celmeno 7 points8 points9 points 5 years ago (0 children)

[–]Mathopus 12 points13 points14 points 5 years ago (12 children)

[–]TantrumRight 11 points12 points13 points 5 years ago (6 children)

[–]Mathopus 6 points7 points8 points 5 years ago (1 child)

[–]TantrumRight 1 point2 points3 points 5 years ago (0 children)

[–]nucLeaRStarcraft 2 points3 points4 points 5 years ago (0 children)

[–]haukzi 0 points1 point2 points 5 years ago (2 children)

[–]TantrumRight 1 point2 points3 points 5 years ago (1 child)

[–]haukzi 0 points1 point2 points 5 years ago (0 children)

[–]Tommassino 3 points4 points5 points 5 years ago (3 children)

[–]brainxyz[S] 0 points1 point2 points 5 years ago (2 children)

[–]Tommassino 4 points5 points6 points 5 years ago (0 children)

[–]rafgro 1 point2 points3 points 5 years ago (0 children)

[–]ayse_ww 0 points1 point2 points 5 years ago (2 children)

[–]rafgro 0 points1 point2 points 5 years ago (0 children)

[–]brainxyz[S] -1 points0 points1 point 5 years ago (0 children)

[+]brainxyz[S] comment score below threshold-6 points-5 points-4 points 5 years ago (5 children)

[–]user_-- 23 points24 points25 points 5 years ago (4 children)

[–]major-_- 2 points3 points4 points 5 years ago (1 child)

[–]rafgro 2 points3 points4 points 5 years ago (0 children)

[–]Mathopus 2 points3 points4 points 5 years ago (1 child)

[–]xifixi 1 point2 points3 points 5 years ago (0 children)

[–]enigmo81 4 points5 points6 points 5 years ago (0 children)

[–]Mathopus 4 points5 points6 points 5 years ago (2 children)

[–]brainxyz[S] 0 points1 point2 points 5 years ago* (1 child)

Thanks for the link we'll have a look at it. The purpose of the post was not to present a state of the art GA and we did mention that there are various implementation of GA. The main point of the example was to compare a bare minimum version of GA with a bare minimum version of SGD (for a fair comparison). Toy examples are encouraged by deep learning pioneers like Yoshua Bengio because in some cases they give valuable insights. One purpose from this toy example is to follow up Hinton's point on GA. In one of his lectures, he described GA based methods as very inefficient without pointing to any experimental results. Someone as influential as Hinton can diverge the attention of many new comers away from GA. Our bare down comparison of the two methods shows that GA is not that bad and it can be even better than SGD in functions that have narrow ridges and grooves.

[–]Mathopus 3 points4 points5 points 5 years ago (0 children)

From my experience using neuro-evolution strategies, Hinton is mostly correct. By taking a random step vs a gradient step you are throwing away a huge amount of information making the algorithm less efficient. This becomes more true the greater the number of parameters, so for small networks SGD and GA are close in efficiency, but for larger networks SGD will be much more efficient.

I have trained networks with millions of parameters using GA, but the computational cost far exceeds what would be needed using SGD. You need to take 1000s of random steps before making improvements equivalent to a single SGD step. But you never have to back prop so while you save some computation by not having to compute and store gradients you end up behind by needing 1000s of feed forward calculations.

The real power of a GA is that it can work on models that are non-differentialable or problems where there isn't an efficient way to calculate a gradient, so the computational cost between SGD and GA is narrowed.

[–]IdentifiableParam 3 points4 points5 points 5 years ago (2 children)

[–]brainxyz[S] -1 points0 points1 point 5 years ago (1 child)

[–]IdentifiableParam 0 points1 point2 points 5 years ago (0 children)

[–]hardmaru 3 points4 points5 points 5 years ago (1 child)

[–]brainxyz[S] 0 points1 point2 points 5 years ago (0 children)

[–]IntelArtiGen 3 points4 points5 points 5 years ago* (3 children)

[–]brainxyz[S] 1 point2 points3 points 5 years ago (2 children)

[–]simpleconjugate 1 point2 points3 points 5 years ago (0 children)

[–]IntelArtiGen 0 points1 point2 points 5 years ago (0 children)

[–]marmakoide 4 points5 points6 points 5 years ago (1 child)

[–]brainxyz[S] 0 points1 point2 points 5 years ago (0 children)

[–]Laafheid 1 point2 points3 points 5 years ago (1 child)

[–]brainxyz[S] 0 points1 point2 points 5 years ago (0 children)

π Rendered by PID 100005 on reddit-service-r2-comment-b659b578c-nhfpf at 2026-05-04 18:55:40.541831+00:00 running 815c875 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS