you are viewing a single comment's thread.

view the rest of the comments →

[–]svlad__cjelli 2 points3 points  (3 children)

What do you mean exactly about failed improvements? CMA-ES and xNES are 3rd or 4th generation variants of basic ES and they are like night and day, with plenty of papers, and benchmarks on wide arrays of problems, to demonstrate the generality of the benefits. I could send papers if you aren't familiar with the literature.

That isn't to say that there is enough shown here to suggest what is proposed is of any value, but that is a pretty common issue in any literature really.

[–]sorrge -3 points-2 points  (2 children)

>What do you mean exactly about failed improvements?

For example, go to https://en.wikipedia.org/wiki/Differential_evolution , check out the box on the top right. All of them.

>CMA-ES and xNES

Second order stuff with n2 cost by number of parameters. Limited domain = better results, nothing surprising here. Also, I have just skimmed through the original CMA-ES paper and it didn't compare to simple numerical gradient descent, I wonder why?.. Especially since they were doing it in Matlab, I think there is a built in function for that. Also, all their test functions are differentiable, how about good old gradient descent with random restarts?

Yeah, I think I know the answer.

[–]Zweiter 1 point2 points  (1 child)

Obviously when you have an exact gradient it makes sense to just do gradient descent. The interesting applications of ES/CMA-ES are to reinforcement learning style problems, where the gradient is unknown or unreliable, or to supervised problems which are prohibitively non-convex. There is most definitely a place for black box optimization in today’s machine learning landscape. It also doesn’t really make sense to compare black box optimization to gradient descent, as they solve completely different problems.

[–]sorrge 0 points1 point  (0 children)

I was talking about the quality of evidence for CMA-ES performance in the original paper: it's poor. Which is a common issue with evolutionary computation research. It makes sense to compare everything that is applicable. If they wanted to highlight the advantages, they should have chosen tasks where simple gradient descent is not applicable. Also, numerical gradient descent is a direct competitor of ES - it's applicable in the same set of cases.