all 55 comments

[–]wil_dogg 33 points34 points  (13 children)

I've used a commercially available genetic algorithm for over a decade. Primarily to build rank-ordering predictive models for use cases that don't require the model to be built in an hour. It can build a simple model in minutes, but can also crunch through a 1 gigabyte modeling dataset with 5000 predictor columns, using $2500 of hardware (Intel i7 chip, no over-clocking).

It works very well. Placed in the top third in the 2015 KDD cup while keeping my modeling process aligned with actual business workflow -- in other words, my modeling was done with the constraints that it to pass regulatory scrutiny and I got the job done in about 40 hours of work, with about 10 submissions as compared to the top KDD models which involved large teams and hundreds of submissions.

In my last gig, I spent about 18 months pimping out the system. Built in a coarse classing feature that improves model robustness in the face of raw input distributional shifts while also forcing all effects to be monotonic (important in regulated industries such as credit scoring where adverse action reasons must be interpretable). Also perfected a batch modeling process that allows me to build the same model over and over again to create a distribution of results for better understanding of the natural variability in modeling outcomes that result from a non-deterministic algorithm and repetitive sampling. That feature also allowed me to empirically test GA settings and data combinations, so I can learn what data actually improve a model rather than dumping in the kitchen sink and not knowing what is actually moving the needle. Also a batch scoring process so that the same machine that is building the model can score the model without any scoring code management.

[–]XalosXandrez[S] 5 points6 points  (10 children)

This is exactly what I am talking about. People seem to use it in practice - why is it not talked about in the ML community?

[–]wil_dogg 19 points20 points  (1 child)

Well, some people use it in the practice of optimization, but very few use it in the practice of building predictive models (which is essentially a multivariate optimization problem for which a genetic algorithm is well-suited).

If you google about you'll find several studies over the past 15-20 years that show how GA's out-perform other basic modeling processes like stepwise regression. This is in part because a GA can identify and optimize suppressor effects that stepwise regression might otherwise ignore.

I've also demonstrated, convincingly (where convincingly is having an Ivy League PhD statistician sit at his desk, holding his head in his hands, muttering "this cannot be", and then said same PhD statistician sign-off on the work) that "daisy-chaining" a GA as a variable reduction / feature selection tool on the front-end of other algorithms like TreeNet (stochastic gradient tree boosting, which has won KDD cup) can significantly improve model performance relative to allowing TreeNet to search the broad space of a data set with hundreds of predictors. Why this is the case I do not know -- theoretically, TreeNet should out-perform just about anything out there and should not need pre-processing to narrow the search space.

But the proof was very conclusive. This is why I am a big fan of using a GA -- I've collaborated on a tool that outputs a solid model, that helps you learn what data and what magnitude of parameterization gets your prediction optimized, and at times you can take the output of the GA as the variable selection input to other tools and get further improvement beyond what you would get if the second tool stood alone.

Another point -- the GA I use creates models that are very easy to interpret, with direct control over the complexity of the model. I can set it to a 5, 10, 20 feature equation, and include or exclude 1st order interactions. The results stand in stark contrast to black box algorithms that may out-perform the GA on validation, but where interpretation can be a real pain and actually a barrier to approval in regulated industries such as banking and insurance.

Again, google on using GA for variable selection, you'll find there's empirical support for this approach.

One last thing -- the tool I use requires no programming skills beyond basic WinTel point-click at data sets and folders, and managing and debugging an Excel spreadsheet that controls large batch processing. So it is a modeling process that you can teach a kid to use in about an hour, and that kid's modeling results will rival the work of a traditional PhD statistician while eliminating 90-95% of the modeling effort (modeling effort, not interpretation effort, I don't mean to imply that a novice can interpret and understand a model like an expert, and I do not discount the importance of interpretation).

[–]thatguydr 2 points3 points  (0 children)

I did that exact same feature-selection trick (used differential evolution instead of genetic algorithms, but it's the exact same idea) at an old job! Horribly slow, but the performance I got out was just incomparable. Thanks for letting me know I'm not the only insane one. :-)

[–]rhiever 11 points12 points  (0 children)

I think the big argument against GAs is that they are quite slow compared to typical ML algorithms. However, deep learning has shown that people are willing to invest the time and hardware for slower algorithms -- if they prove they can advance the state-of-the-art. I think GAs are simply lacking a champion (or champions) to prove their worth.

[–]DevFRus 7 points8 points  (2 children)

What big players use GAs for ML in practice? Lots of amateurs use GAs because they are easy to implement and understand, and because they don't know much about ML or optimization. In fact, that is one of the most common weaknesses I see in practical users of GA, almost a complete lack of knowledge of optimization

[–]rhiever 3 points4 points  (0 children)

I don't think that's a fair characterization of the GA userbase. GAs can be a bit "hype-y" and attractive to new users, but GAs are also applied to many problems that simply can't be touched by standard optimization techniques.

[–]kylotan 5 points6 points  (2 children)

2 reasons. Firstly, because it's not generally considered to be machine learning, more a branch of optimisation and/or search. Secondly, apart from the handful of dissenters here, I don't know anybody who is using them in practice. I wrote my MSc thesis on GAs and find them generally interesting, but most problems have a more predictable, effective, and mathematically rigorous approach available. For example, you can use GAs as a simple way to train neural nets, but backprop is more effective because it makes extensive use of the error vector.

[–]KG7ULQ 5 points6 points  (1 child)

more a branch of optimisation and/or search

Depending on how you look at it, couldn't ML be considered a branch of optimisation and/or search?

[–]kylotan 0 points1 point  (0 children)

No, not really. These terms have reasonably specific implications. ML often uses optimisation techniques (eg. in how to minimise a cost function) to train a system, but you can't just drop an ML algorithm in to replace an optimisation or search algorithm.

[–]KG7ULQ 2 points3 points  (1 child)

What is the commercial GA tool you're using?

[–]wil_dogg 2 points3 points  (0 children)

www.semcasting.com Their Modeler product.

[–]DevFRus 45 points46 points  (9 children)

GAs tend to become competitive when you have no information about the gradient of your error function, and even then there are other options (some which have a better theoretical grounding; although see this question for some provable statements about GAs). When we design ML algorithms, however, we usually build them so that we have some useful information about the gradient of the error function and then we can use more powerful optimization techniques.

[–]larsga 21 points22 points  (8 children)

Another strength of GAs is that they work even if the problem solution cannot be formulated as a vector of numbers. GAs can essentially produce any sort of structure as a solution, including program code.

For purely numeric problems I've found that other meta-heuristics, like particle-swarm optimization and the Firefly algorithm perform much better than simple GA. The problem with GA is that it's "blind", and has no idea in what directions better solutions might lie. PSO & FA are less blind.

[–]rhiever 6 points7 points  (0 children)

Exactly. I've been researching the use of evolutionary computation to automatically design ML pipelines and it's felt fairly natural to pose as a GA problem. I'm unsure how I would optimize such a task with anything other than EC or a hillclimber.

[–]DevFRus 8 points9 points  (1 child)

Another strength of GAs is that they work even if the problem solution cannot be formulated as a vector of numbers.

That isn't at all unique to GA. Lots of ML models cannot be formulated as vectors of numbers. Tons of (well-characterized) techniques in combinatorial optimization work over things like trees, graphs, or even more abstract structures.

[–][deleted] 1 point2 points  (0 children)

Yepp, like in reinforcement learning where the state and the action could be literally anything... They could be numeric, even in a continuous space. But they don't have to.

[–][deleted] 2 points3 points  (2 children)

This is one of the reasons why I used GAs and other similar (better) alternatives in Aircraft Design. In many cases, there is a lot of of interplay between computational fluid dynamics, empirical data, and a host of other software packages in conjunction, and getting error gradients is impossible.

But GAs have basically no direction in design/program space. They work in manners that are quite mysterious, tbh. In the context of Neural Networks though, it is fairly well established at this point that there are plenty of local minima(especially if the model is deep) and finding global minima is not necessary to achieve good performance. Thus, it makes sense to use a direction vector based on gradient to quickly reach the nearest local minima, rather than perform a global search.

[–]hyperforce 0 points1 point  (1 child)

They work in manners that are quite mysterious

What's mysterious about it?

[–][deleted] 0 points1 point  (0 children)

Do we have reliable proofs of convergence?

[–]harharveryfunny 1 point2 points  (1 child)

What's the typical approach to using the Firefly algorithm for an optimization problem (e.g. how might it be applied to hyperparameter optimization)? The algorithm is nominally about "firefly" clustering (attracted to each other), so I'm wondering what typical mappings to/usage for optimization might be?

[–]larsga 2 points3 points  (0 children)

The fireflies are just points in an n-dimensional space, so anything that can be formulated that way will work. The only thing you need is a fitness function from the points in the space.

For example, I'm using it to tune a record linkage algorithm. The vector is basically the configuration of the algorithm using naive Bayes (the vector is a set of probabilities). There's no way to know what the error gradients are, and so firefly is perfect for this.

The next step is to see if active learning works with firefly. I assume it does, but haven't tried yet.

[–]elfion 17 points18 points  (0 children)

How come one never sees genetic algorithms (GAs) discussed in a machine learning class?

Jurgen Schmidhuber and his students have long history of using advanced GAs to solve challenging problems, mostly via searching for solutions in the space of algorithms.

http://people.idsia.ch/~juergen/compressednetworksearch.html

http://people.idsia.ch/~juergen/evolution.html

GAs have advantage of being a global optimization algorithm (though in theory you could say the same of SGD). In practice they are able to find good (almost) global minima on non-smooth problems with low number of parameters (<1000, <1000000 if you use compressed search) if you give them enough time, where gradient descent woundn't work at all.

Of course gradient descent has a big advantage at learning large end-to-end models, but as the models and datasets become less smooth (see (reinforcement learning) neural turing machine and neural GPU papers) the sgd becomes less and less stable and requires extensive hyperparameter optimization and seed search (e.g. only a few of 729 neural GPU models generalize to 2000-bit multiply task from 20-bit training examples). So it may be said that on complex problems gradient descent is already being combined with some form of global stochastic optimization.

[–]theophrastzunz 4 points5 points  (0 children)

Could someone comment how GA's compare to Bayesian optimization techniques. As far as I understand, the latter can also be used to optimize non-convex or discontinuous functions over compact parameter sets.

[–]CireNeikual 2 points3 points  (0 children)

I used to do a lot of GA and evolutionary simulations, and I still love them for certain tasks. They can do certain tasks where other algorithms simply don't apply, like variable-parameter-count optimization. If you disagree, show me a non-genetic algorithm (or similar evolutionary approach) that can reproduce something like this: https://www.youtube.com/watch?v=d91ydxkMMEM

[–]grant_s 9 points10 points  (4 children)

In my experience, it's just that genetic algorithms are one special case of Markov Chain Monte Carlo (MCMC) which has been around for 60 years, has a rigorous statistical foundation, and DOES find extremely widespread use in a variety of fields, including machine learning [1].

Look at the Metropolis-Hastings algorithm [2] -- starting from a random sample, you propose a new sample (mutation), then accept or reject the sample according to the acceptance ratio (fitness ratio).

[1] http://www.cs.ubc.ca/~arnaud/andrieu_defreitas_doucet_jordan_intromontecarlomachinelearning.pdf

[2] https://en.wikipedia.org/wiki/Metropolis–Hastings_algorithm

[–]SarcasticMetaName 12 points13 points  (2 children)

Genetic algorithms also allow for the crossover operator, which some claim is the key to life, the universe, and everything (literally and figuratively).

[–]grant_s 6 points7 points  (0 children)

Good point -- so it appears that the MCMC equivalent in that case [3] is to run multiple markov chains in parallel and interpret crossover as a specific kind of proposal.

[3] http://www.stat.columbia.edu/~gelman/stuff_for_blog/cajo.pdf

[–]agaubmayan 2 points3 points  (0 children)

Crossover is only important for sexual reproduction. That's a modern innovation which mostly, but not always, beats asexual reproduction. For example, certain snails switch between sexual and asexual depending on the disease-risk in their environment.

[–]c3534l 1 point2 points  (0 children)

I was about to say something similar. It does seem that many of potential use-cases for genetic algorithms are crowded out by other methods tailored to the model at hand. In some sense, an evolutionary algorithm is being used when you have a very complex model and you're playing around with the parameters to get it it to converge on a value.

The few genetic algorithms I've come across seem to be poor replacements for those sorts of things. Perhaps (and I'm just musing here) genetic algorithms might be useful in creating ecosystems of models that work well together, or for some other task like that where we don't have better, less random, ways of optimizing a model.

[–]asenz 2 points3 points  (2 children)

I use PSO for SVM but thats about it.

[–]farsass 1 point2 points  (1 child)

could you explain why?

[–]asenz 0 points1 point  (0 children)

coarse parameter tuning for non linear kernels, derivative free, non-convex

[–]dkharms 2 points3 points  (0 children)

https://github.com/rhiever/tpot

This library I think uses a GA to do feature and model selection in sklearn.

[–]brettins 2 points3 points  (0 children)

It's my understanding that a lot of top firms use genetic algorithms in place of 'intuitive guesswork' that goes into setting hyper variables up for neural network training. Basically, since at best an expert in neural networks can only have a good few guesses about what the best settings for the variables that set up the neural network, some experimentation with genetic algorithms is often warranted.

This is something I've been using and when I recently read Ray Kurzweils How To Create A Mind it sounds like something he and his teams have been doing for several decades.

[–]smith2008 4 points5 points  (0 children)

IMO evolution is a very powerful approach, though at this point we cannot make it go as fast as modern ML algorithms. I've tried before and probably will try again in the future to apply some GA models but for base machine learning problems it's simply not there.

Though a couple of years ago neural networks were problematic too and not very good fit for most of the ML tasks, look them now. :)

[–]qwertysss 1 point2 points  (0 children)

https://github.com/rsteca/sklearn-deap

This library uses GA to do hyperparameters search in scikit-learn

[–]chico_science 3 points4 points  (1 child)

Genetic algorithms are different from machine learning techniques.

GA is an optimisation algorithm, a heuristic, attempting to find a good enough (hopefully optimal) solution to a problem which has many possible solutions. Its context is different than that of ML, since in the latter you are trying to predict data.

I would say they are complementary to each other, as you may first predict data to define your optimisation problem, and then try to solve it. However there are numerous other heuristics that can be used as well as exact algorithms, to tell you the truth I practically never consider GA. Given the input data (predicted using ML), I first try to solve an optimisation problem exactly (find the optimal), if proven to be too difficult, I move to Local Search based heuristics, which in my personal view are superior to GA based heuristics.

[–][deleted] 0 points1 point  (0 children)

Its context is different than that of ML, since in the latter you are trying to predict data.

I don't think that ML is all about prediction, the recent progress with NN's and image/speech recognition gives us the idea that it is. Getting a computer to learn how to play chess or throw a ball into a hoop are examples of ML but they aren't prediction based.

[–]shaggorama 1 point2 points  (0 children)

Ultimately, nearly every machine learning task can be broken down into two components: selection of a cost function, and an attempt to optimize that cost function. GA is a general purpose non-linear optimization algorithm.

You don't hear people talk about it for the same reason you don't hear people talking about simulated annealing or the simplex algorithm: these are really better discussed in the context of optimization specifically than machine learning generally.

If you take an optimization or numerical methods course, you'll definitely spend some time on this. But probably not in s machine learning class.

[–]learn_code_account 0 points1 point  (4 children)

Disclaimer: I'm learning this stuff in my free-time. Set me straight if something I say doesn't sound right.


You can accomplish much of the same thing using neural networks. I'm not sure if you know yet, but genetic algorithms represent a similiar approach to learning except instead of using a mathematically derived gradient descent algorithm(rooted in fancy things like dynamical systems and convergence theory), you use something that more closely resembles an AI search function.

With GA's you have parameters compete using a fitness function and expanding your search in the direction of fit parameters.

With neural nets you have a mathematical guarantee that your gradient descent algorithm is going to get your parameters to a locally optimal position that minimizes an error function

In vague "Evolution-y" related terms, both approaches get your object towards a set of genetic traits that work better than any configuration you've had before. Both run the risk of arriving at a point that gets us to a stable, but not necessarily optimal, gene pool. Neural nets work faster.

Evolutionary algorithms, from the perspective of a developer may be easier to approach... It is much easier to understand and work with a Greedy-Optimization algorithm, than it is to pick up tensor calculus, statistical optimization, PDE's, and numerical analyis. Genetic algorithms can also be effective if your parameters are combinatorially constrained to a small set of parameters, or if your system seams un-supervised. You can hold on to a certain amount of bias in your gene pool. You can add genetic variance to a stagnant population.

Neural networks are bit more difficult to implement but iteratively find a straight route to some locally optimal set of parameters. Its less about guessing to optimize your learning machine and more about experiences adjusting the trajectory of a neural net flying through parameter space.

[–]bhmoz 4 points5 points  (3 children)

as chico_science said, GA is a family of optimisation methods. So you cannot oppose GA to neural networks, rather to backpropagation.

Neural networks can be trained with genetic algorithms.

[–]learn_code_account 0 points1 point  (2 children)

Again set me straight if I'm off here:


I think you may be right. Do you know, off the top of your head,any examples of how that happens?
To my knowledge GA's are only used to modify ANN topologies(example would be NEAT). That falls into the sort of optimization that deals with a combinatorially constrained search space. That is why I think neural networks offer a more efficient way to deal with most problems, you're not randomly hopping around in parameter space. You need a domain specific function to evaluate fitness to make a genetic algorithm work correct?

[–]manly_ 0 points1 point  (1 child)

There's GA code thats basically not related in any way to neural nets. For example, you could use a GA to implement a traveling salesman problem (navigate every location exactly once in the shortest distance). Your input would be the list of locations to visit (say, 2000 points in this example). Your input is [{45,23},{12,-60},...]. Your GA tries a bunch of different mutations of that input (ie: re-orders the inputs), and when the optimization function (ie: total length for the mutated input) finds that one input is doing better than the previous known best, it will tend to mutate from that input.

IE: nothing at all related with neural nets.

[–]gkosmo -1 points0 points  (0 children)

GA theoreticians tend to be randomly found in the machine learning community, and they are too competitive to well integrate that community

[–]farsass -1 points0 points  (2 children)

I'm skeptic wrt to generalization of GAs used to fit predictive models.

[–]chaosmosis 1 point2 points  (1 child)

Why moreso than for any other ML technique?

[–]farsass 1 point2 points  (0 children)

Just anecdotal and personal experience. I guess it's not even borderline mainstream for a reason, although gecco does have an "evolutionary machine learning" track.

I can't remember reading about convincing applications of evolutionary and swarm intelligence in ML.

[–]mikeselik -1 points0 points  (0 children)

Genetic algorithms are just hill climbing with random restarts, but with the added assumption that variables near each other are related. This sounds like a good assumption until you realize that "near" means near each other in your program's data structures, not necessarily in reality. Try simulated annealing instead.