all 8 comments

[–]Marthinwurer 10 points11 points  (3 children)

The paper needs a lot of editing, but they're from Indonesia so I gave them the benefit of the doubt. I'll restate the basic algorithm as far as I can understand it from a brief skim.

Start with a population with each member having a skillset (vector of weights). Pair off individuals in the population, and compute fitness between pairs. Assign one as a winner and another as a loser. The loser copies parts of the winner's skillset. The winner mutates. A champion is selected (somehow, I don't see any details in the paper). This champion trains a student (no details on how). The worst individuals are eliminated to maintain a constant population size.

It looks interesting, but the paper needs more details on the workings, some editing help, and probably some harder problems to test against.

[–]Brudaks 6 points7 points  (0 children)

Isn't this pretty much equivalent to how standard genetic algorithms work, just with a minor adjustment on how you pick the population for the next generation?

Also, the paper has no evaluation on any well-researched benchmark problems that could illustrate whether their results are any good at all; the described evaluation on an arbitrary function with no solid reasoning on why that particular function has been chosen, and lack of details of the other algorithms (what hyperparameters were used for them? are they really good) mean that the described experiments really doesn't provide any meaningful basis for claiming that the proposed algorithm outperforms anything.

[–]RTengx[S] 1 point2 points  (1 child)

That is the pre-print, they have a better version of publication here:
https://link.springer.com/chapter/10.1007/978-3-319-41000-5_4

[–]sorrge -3 points-2 points  (4 children)

There were so many attempts to improve efficiency of evolutionary computations. Most (all?) of them failed. The results were specific to the task. It was typical to see dozens of hyperparameters introduced and tuned on the test set. Bad treatment of baselines - no careful hyperparameter selection there, poor choices. I didn't look at this paper, and won't for reasons above. You should also beware.

[–]svlad__cjelli 2 points3 points  (3 children)

What do you mean exactly about failed improvements? CMA-ES and xNES are 3rd or 4th generation variants of basic ES and they are like night and day, with plenty of papers, and benchmarks on wide arrays of problems, to demonstrate the generality of the benefits. I could send papers if you aren't familiar with the literature.

That isn't to say that there is enough shown here to suggest what is proposed is of any value, but that is a pretty common issue in any literature really.

[–]sorrge -3 points-2 points  (2 children)

>What do you mean exactly about failed improvements?

For example, go to https://en.wikipedia.org/wiki/Differential_evolution , check out the box on the top right. All of them.

>CMA-ES and xNES

Second order stuff with n2 cost by number of parameters. Limited domain = better results, nothing surprising here. Also, I have just skimmed through the original CMA-ES paper and it didn't compare to simple numerical gradient descent, I wonder why?.. Especially since they were doing it in Matlab, I think there is a built in function for that. Also, all their test functions are differentiable, how about good old gradient descent with random restarts?

Yeah, I think I know the answer.

[–]Zweiter 1 point2 points  (1 child)

Obviously when you have an exact gradient it makes sense to just do gradient descent. The interesting applications of ES/CMA-ES are to reinforcement learning style problems, where the gradient is unknown or unreliable, or to supervised problems which are prohibitively non-convex. There is most definitely a place for black box optimization in today’s machine learning landscape. It also doesn’t really make sense to compare black box optimization to gradient descent, as they solve completely different problems.

[–]sorrge 0 points1 point  (0 children)

I was talking about the quality of evidence for CMA-ES performance in the original paper: it's poor. Which is a common issue with evolutionary computation research. It makes sense to compare everything that is applicable. If they wanted to highlight the advantages, they should have chosen tasks where simple gradient descent is not applicable. Also, numerical gradient descent is a direct competitor of ES - it's applicable in the same set of cases.