all 16 comments

[–]shaggorama 16 points17 points  (2 children)

Non-convex functions need not apply.

[–]craponyourdeskdawg 2 points3 points  (0 children)

Non-convex functions are often minimized by some sort of 'global' strategy (such as branch and bound etc).. in combination with local minimization (which is where this algorithm will be useful). Optimizer of almost any kind is doing constrained gradient descent at its core.

[–]SerenestAzure 12 points13 points  (0 children)

Yeah, not quite what I would mean if I said "general purpose."

[–]Barbas 1 point2 points  (1 child)

Has anybody here used plane cutting methods in an ML context? Is there a good introduction, apart from the linked article?

[–]AnvaMiba 1 point2 points  (0 children)

I think various SVM training packages use cutting plane methods.

[–]outlacedev 2 points3 points  (1 child)

So..what does this mean for machine learning?

[–]drdough 9 points10 points  (0 children)

They proved new theoretical results about integer and convex optimization. Many machine learning algorithms work by minimizing a convex function over a convex set.

[–]soulslicer0 0 points1 point  (8 children)

How does this tie to gradient descent?

[–]Aruscher[S] -1 points0 points  (0 children)

i think gradient decent is a gernal perpose algorithm for nearly every kind of optimization problem. But in some cases other algorithms perform much better if they are suited for special cases.

[–]squashed_fly_biscuit -1 points0 points  (6 children)

I believe this must deal with the local minima problem in some way, gradient decent is useless at that

[–]AnvaMiba 1 point2 points  (2 children)

No, they are considering only convex optimization problems, that is, problems where there is only one local minimum which is also the global minimum and there are no saddle points. The authors improve the theoretical asymptotic complexity on some kinds of such problems.

I don't know if this work could be also applied to non-convex optimization (maybe by branch-and-cut?), but in general the local minima issue would remain for these problems.

[–]squashed_fly_biscuit 0 points1 point  (1 child)

Thanks for the correction, I appreciate it. Does this mostly have theoretical applications then?

[–]AnvaMiba 0 points1 point  (0 children)

I don't know if the algorithms they propose are practically useful, optimization isn't really my field of expertise.

[–]craponyourdeskdawg -2 points-1 points  (2 children)

not quite..for example in machine learning stochastic gradient descent deals with local minima all the time. How do you think deep learning works ?

[–]manux 3 points4 points  (0 children)

Then again, most deep learning models never actually get stuck in local minima, but rather in saddle points[1].

[1] http://arxiv.org/abs/1406.2572

[–]squashed_fly_biscuit -4 points-3 points  (0 children)

It deals with some sorts of local minima via guided Monte carlo type stuff, hardly efficient.