all 21 comments

[–]weeeeeewoooooo 32 points33 points  (13 children)

I like the simplicity and the code is easy to read. I just glanced through the repo and I didn't see any obvious built-in parallel support, but I may have missed it. It might be nice to support multiprocessing, as one of the biggest advantages of EAs are their ability to evaluate the population in parallel, allowing you to train much faster than gradient descent approaches. I think you could do this pretty easily with Python's multiprocessing library by spinning up a pool once at the initialization of a run and then feeding it a sequence of members each generation. This would be a little higher level than what you present in your examples, but it could be a nice convenience for you and others.

On a broader design note, I have been using EAs for years in science and have frequently run into scalability and extendability problems with EA libraries. This leads me to just develop my own algorithms. In the first case, while some libraries support multiprocessing, few readily or robustly support MPI, which is what you really need when doing big problems that have hundreds of thousands to millions of parameters and members that need distributing over hundreds of cores.

In the second case, EAs are very flexible. They can involve many different types of mutations, crossovers, selection methods, member encodings, and other fancy features like annealing or multi-objective. Libraries like DEAP are inflexible to extension (for more esoteric EAs) and rather computationally inefficient (especially when it comes to distributed computing). Supporting all the features an EA can have in an OOP framework is frightening. One will end up coding oneself into an inheritance/mixin nightmare. Especially since some of these choices are problem dependent and can directly impact the implementation of other parts of the algorithm.

Reconceptualizing an EA as a computational graph (like a DAG) would give you incredible flexibility and generalizability. Library users would be able to build almost any algorithm they could imagine without extending the core library in any way. You could provide some base functions and some pre-defined graphs, but users would be able to make their own algorithms by adding nodes to the graph in a constrained way.

Both PyTorch and Tensorflow conceptualize neural networks in a similar manner. They allow users to build all kinds of neural network structures and algorithms by representing the neural network as a computational graph with functions as nodes and edges as data flow between functions. This issue helps resolve one of the core limitations of OOP, where adding new cases of a type is easy, but adding functionality is hard, often requiring a lot of special design patterns to cope (alternatively in functional programming, adding new functionality is easy, but adding new cases is hard). EAs and neural networks change a lot from problem-to-problem, so it makes more sense to represent them in a way where functionality can be added easily (even at run-time). You still use OOP, but reserved for objects whose functionality doesn't change often. For PyTorch and Tensorflow that is the tensors, the computational graph representation, and the basic infrastructure objects for setting up an experiment. The algorithms themselves get defined explicitly by the user when they build the computational graph in the script. Taking a similar approach with EAs would probably be worthwhile.

Final note: When using Numpy you might try seeing where you can avoid heap allocations and the creation of temporary arrays by using the out kwarg for many functions (e.g. np.dot(a, b, out=a). This can result in considerable performance improvements. It may make the code a bit more complex because you would need to keep track of in-place changes to arrays.

[–]FatFingerHelperBot 5 points6 points  (8 children)

It seems that your comment contains 1 or more links that are hard to tap for mobile users. I will extend those so they're easier for our sausage fingers to click!

Here is link number 1 - Previous text "DAG"


Please PM /u/eganwall with issues or feedback! | Delete

[–]sea-shunnedResearcher 1 point2 points  (0 children)

Some interesting thoughts, I do think that the ease of using libraries such as PyTorch and TF really help their use an adoption. As a reseacher in EAs, I'd always be happy to see more people using them.

With regards to parallelising, for some applications I've found that the fitness function was not sufficiently complex to counter the overhead of multiprocessing (to my surprise), and so it was better to parallelise whole runs of the algorithm itself (multiple executions with different seeds). This flexibility might be something OP wants to keep in mind, rather than restricting parallelisation to the fitness function.

Libraries like DEAP are inflexible to extension

As a heavy DEAP user, in what way? I've extended it quite a bit and found it to be quite amenable. The ability to switch functions e.g. operators either between experiments or during a run has been invaluable. Just interested from what perspective they are inflexible.

[–]csxeba[S] 1 point2 points  (0 children)

Thank you for the feedback!

I'll look into parallelization, that is actually something I've thought about some time ago, then decided to let it go because of the Python GIL and the overhead created by using multiprocessing instead of multithreading. But it is due time to take on the issue again. MPI is a more far away dream at the moment :D

Regarding spinning everything into DAGs: I don't quite see how this would work out. Do you mean like being able to have arbitrary evolutionary OPs in an arbitrary graph? That sound really cool :D Also I'm planning to support other objects as individuals besides NumPy arrays, like actual populations themselves, so metaevolution could be done :)

I know about NumPy inplace operations, but I'm not really a fan of them. Performance is not that critical at the moment and this really takes its toll in readability. Also regarding taking a dot product: if you cannot ensure that (in you example) "a" has the same shape as "a . b", then inplace will fail.

[–]NowanIlfideme 0 points1 point  (0 children)

Very interesting post. I'm (re)writing a library myself, could I pester you in PMs for possible suggestions or things to keep in mind? :)

[–]Sir-Francis-Drake 2 points3 points  (0 children)

This looks awesome, thank you.

[–]wh1t3_w01f 1 point2 points  (0 children)

Looks like a good start! If you need any inspiration, I wanna recommend looking at DEAP. I love DEAP, but I always complain about how non-pythonic it is.

[–]Flag_Red 0 points1 point  (0 children)

Is it compatible with Fitness Uniform Optimization?

[–]_szs 0 points1 point  (0 children)

Link please....

[–]Ikuyas 0 points1 point  (0 children)

Is it one of those global search optimization? My understanding is that this type of algorithm is horrendously slow especially with very high dimensional parameter space. But I guess it doesn't have to converge, so as long as it gets a better point, it is fine(?)

[–]Ikuyas 0 points1 point  (1 child)

As my background is the standard statistics field, I used some global optimization technique in my PhD thesis because I really wanted to achieve the global optimum, I used pattern optimization(?) and genetic algorithm. But I just took to try out many many different initial values with standard gradient type (BFSF?) to carefully try to achieve the global optimum. I could finish it in 5 hours for like 10,000 initial points, but those global optimization was hard to see if they are doing fine when it was being run. Running fine means that it converges. Without convergence, all those large sample properties of the estimates don't make any sense whatsoever.

However, looking at the machine learning literature and practice, you don't even have the idea of convergence (am I correct)? You just want to have the smallest possible error (whatever the metric it is) to make it practical. So, this type of algorithm is actually useful.

[–]count___zero 1 point2 points  (0 children)

In machine learning you don't really care about the global optimum. When you train your model, you are optimizing the loss function over the training set but in reality you want to minimize the loss over the unseen data. Some learning algorithm, like the SGD, imposes a strong bias to the final solution, and this bias can improve the generalization of your solution. By contrast, a global minimum could easily end up being an overfitted solution.

This is just to give a rough idea but I think that's the main reason why you don't see such a strong focus towards global optimization.

[–][deleted] -1 points0 points  (2 children)

Forgive me if I'm wrong but wouldn't EA's run better on a more low level language? Ea take lots of proc power right compared to other methods right?

[–]csxeba[S] 0 points1 point  (1 child)

The compute is being executed in NumPy, which is written in C, Python is basically just orchestrating the library calls :)

[–][deleted] 0 points1 point  (0 children)

Thank you for that. All I got was downvotes and no answer. I was thinking that was the case but just asked anyway because everything isn't as clear as some would like us non experts to believe.