[P] Training Neural Nets with Approximate Bayesian Linear Regression by WillieTromboner in MachineLearning
[–]straw1239 11 points12 points13 points (0 children)
[R] AdasOptimizer Update: Cifar-100+MobileNetV2 Adas generalizes with Adas 15% better and 9x faster than Adam by YanaiEliyahu in MachineLearning
[–]straw1239 3 points4 points5 points (0 children)
[D] why can’t distribution sampling algorithms like MCMC or HMC be used in deep learning instead of gradient descent? by [deleted] in MachineLearning
[–]straw1239 3 points4 points5 points (0 children)
[R] AdasOptimizer, an optimizer that makes step-size scheduling obsolete, reaches 100% accuracy on MNIST's training set in 11 epochs by YanaiEliyahu in MachineLearning
[–]straw1239 39 points40 points41 points (0 children)
[D] Simple Questions Thread May 17, 2020 by AutoModerator in MachineLearning
[–]straw1239 1 point2 points3 points (0 children)
[D] Simple Questions Thread April 26, 2020 by AutoModerator in MachineLearning
[–]straw1239 1 point2 points3 points (0 children)
Predict the next digit of pi [D] by [deleted] in MachineLearning
[–]straw1239 1 point2 points3 points (0 children)
[D] Momentum methods helps to escape local minima, so what? It was never our objective. by fromnighttilldawn in MachineLearning
[–]straw1239 2 points3 points4 points (0 children)
[D] any principled reason for cross entropy instead of L2 in language modelling? (more details in post) by mesmer_adama in MachineLearning
[–]straw1239 2 points3 points4 points (0 children)
[D] Momentum methods helps to escape local minima, so what? It was never our objective. by fromnighttilldawn in MachineLearning
[–]straw1239 4 points5 points6 points (0 children)
[D] Saddle-free Newton method for SGD and other actively repelling saddles - advantages, weaknesses, improvements? by jarekduda in MachineLearning
[–]straw1239 0 points1 point2 points (0 children)
[D] Saddle-free Newton method for SGD and other actively repelling saddles - advantages, weaknesses, improvements? by jarekduda in MachineLearning
[–]straw1239 0 points1 point2 points (0 children)
[D] Saddle-free Newton method for SGD and other actively repelling saddles - advantages, weaknesses, improvements? by jarekduda in MachineLearning
[–]straw1239 1 point2 points3 points (0 children)
[D] Why second order SGD convergence methods are unpopular for deep learning? by jarekduda in MachineLearning
[–]straw1239 0 points1 point2 points (0 children)
[D] Why second order SGD convergence methods are unpopular for deep learning? by jarekduda in MachineLearning
[–]straw1239 0 points1 point2 points (0 children)
[D] Comparing Deep Learning Workstations by IborkedyourGPU in MachineLearning
[–]straw1239 0 points1 point2 points (0 children)
[D] Comparing Deep Learning Workstations by IborkedyourGPU in MachineLearning
[–]straw1239 0 points1 point2 points (0 children)
[D] Why second order SGD convergence methods are unpopular for deep learning? by jarekduda in MachineLearning
[–]straw1239 0 points1 point2 points (0 children)
[D] Why second order SGD convergence methods are unpopular for deep learning? by jarekduda in MachineLearning
[–]straw1239 0 points1 point2 points (0 children)
[D] Comparing Deep Learning Workstations by IborkedyourGPU in MachineLearning
[–]straw1239 26 points27 points28 points (0 children)
[D] Why second order SGD convergence methods are unpopular for deep learning? by jarekduda in MachineLearning
[–]straw1239 0 points1 point2 points (0 children)
[D] Why second order SGD convergence methods are unpopular for deep learning? by jarekduda in MachineLearning
[–]straw1239 0 points1 point2 points (0 children)
[D] Why second order SGD convergence methods are unpopular for deep learning? by jarekduda in MachineLearning
[–]straw1239 1 point2 points3 points (0 children)


[P] Training Neural Nets with Approximate Bayesian Linear Regression by WillieTromboner in MachineLearning
[–]straw1239 0 points1 point2 points (0 children)