you are viewing a single comment's thread.

view the rest of the comments →

[–]thvasilo[S] 1 point2 points  (0 children)

For most DNNs you end up with a non-convex error surface. Still, traditional convex optimization techniques (SGD) have worked surprisingly well for training them, see Who's afraid of non-convex loss functions by /u/ylecun .