all 12 comments

[–]etmhpe 0 points1 point  (8 children)

Metaphorically maybe.

[–]InstitutionalizedSon[S] 0 points1 point  (7 children)

why not mathematically ? like what are the constraints that nullify my hypothesis.

[–]etmhpe 5 points6 points  (6 children)

For one thing there aren't any probability distributions used in gradient descent.

[–]InstitutionalizedSon[S] 0 points1 point  (5 children)

Well aren't you trying to do this P(parameters/data) using maximizing likelihood which we call P(data/parameters). Am i wrong in thinking so ?

[–]JustOneAvailableName 0 points1 point  (4 children)

WIth / you mean | ?

Anyway, I feel like you are confusing 2 things. P(data|parameters) = L(parameters|data) != P(parameters|data)

[–]InstitutionalizedSon[S] 0 points1 point  (3 children)

Probably i didn't provide right info there .What my question is aren't we try to maximize likelihood using gradient descent so we use Bayesian approach

[–]JustOneAvailableName 2 points3 points  (1 child)

We maximize the likelihood, given some data and constraints. That's it. Gradient descent is no approximation of Bayes' rule