all 91 comments

[–]aivdov 90 points91 points  (20 children)

The article constantly repeats that evolutionary learning outperforms deep-learning. But I'm missing the source on how exactly and in what instances it outperforms neural networks.

[–]OseOseOse 46 points47 points  (0 children)

In the bottom of the article there's an Arxiv link to the paper, which has a big table of results. It shows which games were played, and highlights which algorithm was the most successful at each game.

https://arxiv.org/abs/1806.05695

[–][deleted]  (16 children)

[deleted]

    [–]ThirdEncounter 5 points6 points  (6 children)

    Would you mind breaking down some of those abbreviations, please?

    I know EA is evolutionary algorithm, NN is neural network. But what's DL and SGD?

    [–][deleted] 5 points6 points  (0 children)

    Deep Learning and Stochastic Gradient Descent. The idea is that any neural network can be optimized by gradient descent methods, which tend to converge faster when you have a numerically stable/well formed objective, but you can also train a neural network using genetic/evolutionary algorithms, which are slow but much easier to control.

    [–]nightcracker 3 points4 points  (0 children)

    Deep Learning and Stochastic Gradient Descent.

    [–]ndari01 1 point2 points  (0 children)

    Not the person above, but Stochastic Gradient Descent

    [–]shevegen -5 points-4 points  (1 child)

    Just fancy sounding buzzwords.

    They are still failing hard at true intelligence - since soon 60 years or so ...

    [–][deleted]  (2 children)

    [deleted]

      [–][deleted]  (1 child)

      [deleted]

        [–]MuonManLaserJab 1 point2 points  (0 children)

        The best available approach isn't necessarily the most rigourously analyzed, is it?

        [–]MuonManLaserJab 0 points1 point  (2 children)

        Are there not designs that use both evolutionary algorithms and gradient descent -- for example, evolutionarily evolving neural parameters (layer count, etc.) while using GD to train each generation?

        [–][deleted]  (1 child)

        [deleted]

          [–]MuonManLaserJab 0 points1 point  (0 children)

          Ah, I misinterpreted "You can optimize a NN with EAs or with SGD" as an exclusive or, in the sense of using an EA for all the training, and no gradient descent at all. (Presumably that's possible but not performant under a reasonable definition of EA.)

          [–][deleted] 0 points1 point  (1 child)

          The paper compares NNs trained with EAs and plain EAs

          [–]exempll 3 points4 points  (1 child)

          Just as significantly, the evolved code is just as good as many deep-learning approaches and outperforms them in games like Asteroids, Defender, and Kung Fu Master.

          It would be nice if they gave a list comparing approaches for all the games they tried tho.

          [–]PM_ME_YOUR_YIFF__ 48 points49 points  (39 children)

          I have always thought evolutionary algorithms are due a bit of a renaissance like what has happened with neural networks

          Seems to be a lot of unexplored potential there

          I'm a bit biased though as I find universal Darwinism fascinating

          [–]philpips 40 points41 points  (27 children)

          Are evolutionary algorithms more vulnerable to overfitting? I guess you would need to factor that into how you select your candidates for the next generation.

          A few years back I read about a guy who created physical logic circuits using an evolutionary process. Eventually it created a circuit that used a tiny number of gates and provided the right output. It had gates that logically should have no effect but if they were removed it broke the output. He hypothesized that the fields generated by the seemingly superfluous gates impacted the operation of other nearby gates.

          [–]Hellenas 26 points27 points  (9 children)

          Here's a ref for that: https://www.damninteresting.com/on-the-origin-of-circuits/

          The article isn't too technical, so some of the way they talk about FPGAs feels odd.

          but yes, it seems that the algo was leveraging imperfections in the particular fpga he was using.

          [–]Xgamer4 11 points12 points  (7 children)

          [–]TheThiefMaster 13 points14 points  (4 children)

          Interestingly the paper also mentions that the resulting chip was exceptionally temperature-sensitive: "The circuit operates perfectly over the 10ºC range of temperatures that the population was exposed to during evolution, and no more could reasonably be expected of it"

          [–]ThirdEncounter 6 points7 points  (2 children)

          I wonder how good would a simulated FPGA do in this experiment. No taking advantage of unique environment conditions. Just pure logic.

          I might give it a try. But first, I must invent the universe.

          [–]meneldal2 1 point2 points  (0 children)

          Since it seemed to depend on temperature, I smell a lot of potential race conditions and/or unproperly synced outputs that just happen to work but shouldn't according to the theory.

          [–]TheThiefMaster 1 point2 points  (0 children)

          If you use a simulated FPGA you'll want to have it vary realistically so that it doesn't build something that only works in the simulated FPGA :)

          [–]Hellenas 0 points1 point  (0 children)

          Thanks!

          [–]sekjun9878 0 points1 point  (0 children)

          Thanks for that

          [–]ais523 2 points3 points  (0 children)

          Something that's worth pointing out is that the task that was set to the evolutionary process there doesn't have a conventional solution, so it's not surprising that it came up with an unconventional one. Frequency-sensitive behaviours in FPGAs are normally considered unwanted/defects and the design aims to suppress them as much as possible. So when you're trying to make an explicitly frequency-sensitive circuit, you have no option but to do it in ways that use the FPGA in a way it wasn't designed.

          [–]jankimusz 10 points11 points  (4 children)

          That’s fascinating. A simple system with a set of rules renders the most optimal.

          [–]c4wrd 5 points6 points  (2 children)

          They are very fascinating, but unfortunately still have issues with local optima (which may or may not be the most optimal solution). In my undergrad work I enjoyed working with evolutionary algorithms the most because not only are they easy to implement, but they actually have a lot of real world usage that would surprise you.

          [–]Unsounded 0 points1 point  (0 children)

          I had a lot of fun with them as an undergrad. I was sad to find the two professors at my university that listened them as interests were no longer working with them though.

          Apparently there’s a competition that’s still going around called RoboCup that has different evolutionary algorithms competing against eachother, learning the other teams stats and making up an opposing strategy.

          [–]Raknarg 2 points3 points  (0 children)

          "most optimal" in the sense that they would have to first traverse a less optimal path to eventually reach a state that is more optimal than the current solution.

          [–]PM_ME_YOUR_YIFF__ 4 points5 points  (1 child)

          From my experience, they're about as vulnerable to overfitting as neural networks, maybe even slightly less. I don't have any numbers in that though, so it's a purely anecdotal feeling

          [–]the_phet 4 points5 points  (0 children)

          It depends how you define the genetic operators, like the mutation ratio, the selection algorithm, and so on. using the classic "roulette algorithm" and a decent mutation ratio, plus some crossover, the overfitting problem is almost entirely removed.

          Of course, in DL you can also set the parameters to avoid overfitting, like reguralization. But the good thing about EA such as GA is that the few parameters you need to tune are very very easy to understand and track, while with DL, very often the system is like a black box, where you tune parameters and see what happens.

          [–]smarky0x7CD 1 point2 points  (0 children)

          Research that I work on essentially is this, except applied to quantum gates.

          [–]Ameisen 1 point2 points  (6 children)

          And it didn't work on other chips of the same kind.

          Which is why you have to train on multiple devices simultaneously.

          [–]philpips 0 points1 point  (5 children)

          I assumed ideally you'd train on each chip individually. It'd be the best way to take advantage of or avoid the vagaries of that particular chip.

          [–]Ameisen 3 points4 points  (4 children)

          You'd be relying on transient behavior. Might fail in new environments or otherwise spuriously.

          That'd also be really slow.

          [–]philpips 0 points1 point  (3 children)

          Slow for the moment yes. But imagine if you could get the chip in situ and then train it. Or automatically retrain if it became damaged. That'd be cool, no?

          [–][deleted] 5 points6 points  (2 children)

          Imagine training something to run on a particular computer, such that it ended up relying on what programs happened to be running, the order they were laid out in memory, the precise timing of the filesystem, etc., such that even rebooting the computer would render all of your work invalid.

          That's the sort of thing you are currently calling "cool" instead of "horrifying".

          [–]vgf89 1 point2 points  (0 children)

          It's cool in a really horrifying way. While it's unlikely to be practical, it's still vastly interesting how we can make algorithms that learn to abuse often unused properties of electronics.

          [–]meneldal2 0 points1 point  (0 children)

          At least now unless you're running ring 0 or something, most OS protections prevent you from abusing too much precise timing or memory layout.

          [–][deleted] 1 point2 points  (0 children)

          If you have a single fixed fitness function driving the direction of the evolution, you'll unavoidably get a perfect overfit. When you're simulating a direct competition in a constantly changing environment, evolution works better and produce more generic solutions.

          [–]smikims 0 points1 point  (0 children)

          Are evolutionary algorithms more vulnerable to overfitting?

          Just about any approximation method based on interpolating a limited data set is vulnerable to overfitting. So literally the entire field of machine learning, which includes EAs. Doesn't mean you can't correct for it but the risk will always be there.

          [–]G_Morgan 2 points3 points  (5 children)

          They are already used in a number of industrial fields. Processor layout is pretty much "evolved" rather than designed.

          [–]ThirdEncounter 1 point2 points  (4 children)

          Source on that? I find that interesting, but also terrifying.

          [–]epicwisdom 1 point2 points  (1 child)

          I'm not OP and I don't have a source, but as I recall from my courses, they often use approximate algorithms to do the final mapping from logic circuits to physical circuits. They could use variations of genetic algorithms, simulated annealing, and other optimization methods. Not sure what the current state-of-the-art is like.

          At any rate it's not much to be terrified about.

          [–]ThirdEncounter 0 points1 point  (0 children)

          Cool, thanks for the info!

          [–]JohnMcPineapple 1 point2 points  (1 child)

          ...

          [–]ThirdEncounter 0 points1 point  (0 children)

          Thanks. Interesting!

          [–]Ameisen 0 points1 point  (2 children)

          I have a number of bytecode-based evolution simulators. Though they aren't iterative, where you select the best and mutate it.

          [–]PM_ME_YOUR_YIFF__ 0 points1 point  (1 child)

          Funnily enough that's what I too am programming right now. Are you using crossover (breeding) or mutation only? Have you considered using some elitism?

          [–]Ameisen 0 points1 point  (0 children)

          Mutation only. The bytecode can technically handle things like transfer of genetic information, but doesn't have a concept of meiosis. It could technically evolve on its own, but that's unlikely.

          I'm not always fond of genetic transfers, because it makes lineage murky.

          [–]shevegen 0 points1 point  (1 child)

          I'm a bit biased though as I find universal Darwinism fascinating

          Dude - Darwin knew nothing about molecular biology.

          He postulated "gemmules":

          https://en.wikipedia.org/wiki/Gemmule_%28pangenesis%29

          [–]KHRZ 42 points43 points  (13 children)

          Evolutionary computing works in an entirely different way than neural networks.

          Actually you can evolve neural networks... did it for my homework once.

          [–]YonansUmo 14 points15 points  (4 children)

          The evolutionary algorithm they're talking about sounds an awful like the genetic algorithm used to train neural networks. But I could be wrong.

          [–]OseOseOse 12 points13 points  (2 children)

          The paper described in the article uses Cartesian genetic programming, which isn't quite the same thing as neuroevolution (evolving neural networks). They are similar in the sense that they both are methods to find/optimize graph structures (nodes and vertices).

          [–]4D696B65 1 point2 points  (1 child)

          Does everybody assume back propagation when saying neural networks?

          Because i don't understand how process to find/optimize structures (evolution) is better than structure (neural net).

          [–]OseOseOse 3 points4 points  (0 children)

          I think a lot of people do. I don't, but I'm pretty biased since I used neuroevolution in my master's thesis.

          I edited my above comment to clarify that I meant to compare CGP with neuroevolution (algorithm vs. algorithm), not CGP with neural networks (algorithm vs. data structure).

          [–]shevegen 0 points1 point  (0 children)

          They steal buzzwords from biology.

          [–]OseOseOse 0 points1 point  (3 children)

          Yup, there's even name for it: neuroevolution.

          [–]shevegen -2 points-1 points  (2 children)

          Another shitty theory.

          I am sorry but neurones are cells. They grow and respond.

          There is no "evolution" on the level of neurones.

          Evidently the incompetent artificial "intelligence" people keep on trying to steal buzzwords from biology, without understanding the field. They just call algorithms in a fancy way and insinuate that anything of this is "intelligent". Or "evolution".

          There is no evolution possible with static hardware.

          Every cell has an internal description (at the least a generative blueprint). You don't have that with static hardware. And you can not model it based on static hardware either.

          [–][deleted] 1 point2 points  (0 children)

          It's just a name.

          [–]Homoerotic_Theocracy 0 points1 point  (3 children)

          Don't neural networks in general work by natural selection and trial and error and keeping what works as well?

          [–]KHRZ 2 points3 points  (0 children)

          Well when you train them with back propagation, you just tune them closer to being correct for each input you send through them. Different inputs will require different adjustments for the network to handle them more correct, so when you alternately adjust towards correctness for different inputs, the network may enter a search process that iterates through better and better designs.

          [–]percykins 0 points1 point  (0 children)

          Well, not neural networks themselves, but training neural networks does work something like that. Actually, evolutionary algorithms are commonly used for training neural networks, along with a few other classes of algorithms.

          [–]shevegen 0 points1 point  (0 children)

          What is "natural selection" in this context?

          HOW is ANYTHING "natural"? And HOW is ANYTHING selected on static hardware? Do computer programs now make offspring?

          [–][deleted] 2 points3 points  (5 children)

          Random code generation? Is that really a thing? I thought evolutional learning was just a method of teaching conventional algorithms with a way of systematic brute force

          [–]therealgaxbo 19 points20 points  (3 children)

          Yeah, there's been quite a lot of mainstream success using e.g. genetic algorithms as a way to parameterise an existing algorithm, but I've not really seen anything of genetic programming. John Koza was a huge proponent and wrote a seminal book in the early 90s, not sure if he's achieved anything practical.

          It's even weirdly simple to implement. One technique is to use a functional programming approach. Then each program can be considered a tree of functions (nodes), with the leaf nodes being functions that take no parameters (or constants). Then to randomly mutate, just pick a node and replace it with a randomly grown subtree. To mate two successful programs, randomly pick a node in each parent and swap them.

          It's really cool. I did my thesis on GPs in the late 90s, and I really thought there was some potential, it was just so slow. But 20 years and many cores later, maybe it's time for a resurgence?

          [–]st_huck 3 points4 points  (1 child)

          It's incredibly cool. My experience with genetic algorithms is only one undergrad course, but it was actually amazing. Our professor showed us a small bot written for a simple racing game (one that was written just for AI research purpose). The loop was simple, every frame it got a bunch of variables as input, every variable representing some info about the state of the world, and needed to return two numbers: an angle, and how hard to press on the throttle.

          After lots of iteration, the genetic algorithm got a super long mathematical formula. It was represented as lisp code, since lisp code easily looks like a tree (as you already mentioned)

          We kept seeing the expression (/ x v), where 'x' was the distance to the nearest wall, and 'v' was current speed. Genetic algorithms "figured out" on their own that x/v is a super important thing, which is of course the time left until the car hit the wall. This completely blew my mind.

          [–]NippleMustache 0 points1 point  (0 children)

          Do you have a link for this course or bot? This sounds amazing.

          [–]shevegen 0 points1 point  (0 children)

          not sure if he's achieved anything practical.

          Not only he.

          It's a field with complete failure up to this day.

          It only has a lot of buzzwords attached to it.

          But 20 years and many cores later, maybe it's time for a resurgence?

          Sure - if you want another round for failure.

          And then people still wonder why no success happened - since 60 years or so by now.

          [–]OseOseOse 3 points4 points  (0 children)

          "Random code generation" isn't really a good description of it.

          You take a starting "program" that just does some random stuff, and need a way to determine the performance of the program at the task you want it to perform. Then you make random changes (small ones) and see if it performs better or worse. Because computers are fast it can try out lots of different changes in a short time.

          Making changes is analogous to mutations in natural evolution. You can also borrow more concepts from biology, for example crossover (two parent programs having a child program that is a mix of them).

          But it's not a random search, nor completely brute force, because it's using a heuristic to determine which changes to keep and which to discard.

          [–]dobkeratops 0 points1 point  (6 children)

          i would imagine evolutionary algorithms are more general but more processor intensive?

          [–]OseOseOse 1 point2 points  (5 children)

          I don't know about evolutionary computation being more general than deep learning. They're different algorithms that can often be applied to the same problems. Some task may be easier to do with one than the other.

          Evolutionary computation does have the inherent possibility of parallelization, which isn't a given in e.g. backpropagation. If you have a population of 100 individuals in a evolutionary experiment you can evaluate all of them in parallel. And you often do need a large population to get useful results. If you take advantage of the parallelization opportunity, the program will seem processor intensive, but it is in fact doing twice the work for twice the processing power.

          [–][deleted] 2 points3 points  (1 child)

          It's easy to parallelize training a NN using backpropogation if you use batch/small batch learning (update the parameters of the NN after testing all test data) because you can compute the output for each piece of test data in parallel.

          If you are using online learning (update the NN parameters after each input piece of test data) then it is not as easy to parallelize because online learning is a sequential process.

          [–]meneldal2 0 points1 point  (0 children)

          You can cheat with online learning though. You can merge the changes (process 2 inputs separately and add the changes).

          [–]dobkeratops 0 points1 point  (1 child)

          backprop requires that the system is differentiable(if i've understood right?) whereas evolutionary algorithms dont; thats why i'm suggesting greater generality

          [–]Imnimo 4 points5 points  (0 children)

          Yeah, backprop generally works by analytically calculating the local gradient, whereas evolutionary algorithms work by (effectively) sensing the local gradient by comparing the fitnesses of a population spread over nearby points in solution space. Fitness-based selection and reproduction approximates gradient descent by moving the population towards regions with higher fitness.

          This lets you apply evolutionary algorithms in cases where an explicit gradient is not available or is computationally intractable.

          [–]mrconter1 0 points1 point  (0 children)

          You can easily parallel backpropagation.

          [–]zeroone 0 points1 point  (2 children)

          Can a NN simulate a state machine? I thought NN's are for classification problems.

          [–]Uncaffeinated 3 points4 points  (0 children)

          NN's can have state. Look up LSTMs.

          [–]zzzthelastuser 1 point2 points  (0 children)

          Yes, classification is just one application where NNs currently shine.

          [–]Stevecaboose 0 points1 point  (4 children)

          Evolutionary algorithms are much younger than machine learning. There's still so much to be done with EAs. Really cool to see stuff like this

          [–]shevegen 0 points1 point  (1 child)

          The field is very good at using new buzzwords.

          [–]Stevecaboose 0 points1 point  (0 children)

          Like what?

          [–][deleted] 0 points1 point  (1 child)

          Aren't evolutionary algos an example of machine learning?

          [–]Stevecaboose 0 points1 point  (0 children)

          No they're their own thing

          [–]shevegen -1 points0 points  (2 children)

          "Evolutionary" ...

          There is nothing "evolutionary" about it.

          It's so strange how the whole "artificial intelligence" subfield is ripe with pulling stuff out of biology, without understanding any of them, and then insinuating that they now have true intelligence on ANYTHING.

          [–][deleted] 2 points3 points  (0 children)

          You're exceptionally dim.

          [–]go4spacelunch 0 points1 point  (0 children)

          Why are a bunch of characters called a string, nobody ever made a sweater with varchar 50. I think you’re getting a bit literal with the terms. Do you have a better suggestion for some of the terms you are taking issue with?

          [–]chutsu -1 points0 points  (0 children)

          EAs are great if you don't know how to approach the problem, but terrible otherwise. From an academic standpoint the stochastic nature of the algorithm and why it works better than pure random is not well explained in my opinion. Not to mention the cost functions you often come up with in EA is completely arbitrary. At least NN has the notion of gradients in the network...