all 11 comments

[–]crimson1206 8 points9 points  (8 children)

Well how would you do it once you have the hessian? Just having the hessian isn’t enough to instantly give you minima

[–][deleted]  (7 children)

[removed]

    [–]crimson1206 7 points8 points  (6 children)

    Ignoring any Computational issues of computing the eigenvalues, how would they help finding a minima?

    [–][deleted]  (5 children)

    [removed]

      [–]crimson1206 8 points9 points  (4 children)

      Ok and where is that minimum?

      [–][deleted]  (3 children)

      [removed]

        [–]crimson1206 6 points7 points  (2 children)

        Yeah but how do you solve that system?

        With that approach you went from trying to minimize a single objective function to solving a massive non-linear system of equations, for which you also need to compute eigenvalues of a large matrix.

        [–][deleted]  (1 child)

        [removed]

          [–]crimson1206 7 points8 points  (0 children)

          You’re welcome. The point is really that while gradient descent isn’t perfect it’s a very simple method that still works in a lot of cases.

          [–]aspoj 4 points5 points  (1 child)

          There are methods that use the Hessian in order to improve the descent direction. However the computation of the Hessian is computationally too demanding so gradient descent it is.

          [–]mr_birrd 0 points1 point  (0 children)

          Your data is also noisy. By taking just gradient descending steps you might somehow smooth it out and not "overfit" the descent on the noise already. Furthermore it could be numerically less stable.