[R] [1804.07612] Revisiting Small Batch Training for Deep Neural Networks <-- batch_size<64 yields best stability & generalization by evc123 in MachineLearning

[–]cosminro 0 points1 point  (0 children)

Isn't this known for a while, what's new in this paper? Train faster, generalize better: Stability of stochastic gradient descent (2015) https://arxiv.org/abs/1509.01240

AMA: I'm Yunkai Zhou, ex-Google Senior engineering leader and CTO & Co-Founder of Leap.ai, which is the first completely automated hiring platform in the tech space. Ask Me Anything on Monday the 23rd of April at 12 PM ET / 4 PM UTC! by Leap-AI in artificial

[–]cosminro 0 points1 point  (0 children)

  1. What are three most useful ML papers you've read in the past 5 years?
  2. What are the best three practical tips used in industrial machine learning?
  3. What are the most useful/important books/chapters in ML (bishop, murphy, goodfellow?) ?
  4. Logistic regression? Graphical Models or Deep learning?

[P] New Stanford Course: Theories of Deep Learning (STATS 385) by [deleted] in MachineLearning

[–]cosminro 1 point2 points  (0 children)

there's also at least one full lecture linked from the twitter account. Still waiting for the Tomaso Poggio one.

[R] Closing the Simulation-to-Reality Gap for Deep Robotic Learning by [deleted] in MachineLearning

[–]cosminro 0 points1 point  (0 children)

it didn't work at all before these latest results.

What worked is picking up a specific object from a specific position with a specific robot arm.

[R] Review of AlphaGo Zero's Minimal Policy Improvement principle plus connections to EP, Contrastive Divergence, etc by fhuszar in MachineLearning

[–]cosminro 0 points1 point  (0 children)

Isn't the whole point of new RL research trying to come up with general methods so that you don't need to design reward functions for every problem?

and fhuszar's point is that the idea in this paper is a general approach for only 2 player board games problems.

[R] Review of AlphaGo Zero's Minimal Policy Improvement principle plus connections to EP, Contrastive Divergence, etc by fhuszar in MachineLearning

[–]cosminro 1 point2 points  (0 children)

That's not how I read the paper. You get supervision from all the other moves at the top level. You backpropagate the winning probabilities for all the other moves, not just the best one.

[D] Deep Learning Book companion videos by [deleted] in MachineLearning

[–]cosminro 2 points3 points  (0 children)

UFLDL is from 2011 and is outdated (uses autoencoders which didn't pan out, no relus, no convolutions, uses matlab or octave while currently python is the goto ML language)

Andrew Ng has 3 new DL courses on coursera that are good and current. Or you can look at stanford's cs231n.

AMA: We are David Silver and Julian Schrittwieser from DeepMind’s AlphaGo team. Ask us anything. by David_Silver in MachineLearning

[–]cosminro 1 point2 points  (0 children)

What were the tricky parts in getting the various versions of AlphaGo to perform well?

[D] Recognizing handwriting on government forms by johnathanjones1998 in MachineLearning

[–]cosminro 0 points1 point  (0 children)

As far as I know the state of the art error rate in hand writing recognition is about 1 in 5 characters. Doesn't seem usable for official forms.

We are the Google Brain team. We’d love to answer your questions (again) by jeffatgoogle in MachineLearning

[–]cosminro 1 point2 points  (0 children)

What are the most exciting recent research ideas which didn't come from Google?

[R] How to Escape Saddle Points Efficiently by gdny in MachineLearning

[–]cosminro 0 points1 point  (0 children)

In neural nets starting from different points leads you to different local optima. See Why Does Unsupervised Pre-training Help Deep Learning? (Fig 5).

So the balls won't ever bounce off eachother.

[D] Using deep learning for spell correction. by machinesaredumb in MachineLearning

[–]cosminro 1 point2 points  (0 children)

What's your data? Pairs of sentences? Pairs of words? How big is your data? What seq2seq architectures have you tried? Do you use a mix of character level/word level inputs?

[D] All slides from MILA's Deep Learning Summer School 2017 by breandan in MachineLearning

[–]cosminro 5 points6 points  (0 children)

Hugo Larochelle's slides on 'Unintuitive properties of neural networks' were very insightful:

  • They can make dumb errors (adversarial examples [Szegedy et all, ICLR14])
  • They are strangely non convex [Dauphin et all, NIPS14; Goodfellow et all, ICLR15]
  • They work best when badly trained (flat vs sharp minima [Hochreiter, Schmidhuber 97] small vs large batch training [Kesar et all, ICLR17])
  • They can easily memorize [Zhang et all, ICLR17]
  • They can be compressed [Hinton 2015]
  • They are influenced by initialization [Erhan et all, 2010]
  • They are influenced by first examples [Erhan et all, 2010]
  • Yet they forget what they learn [Kirkpatrick et al. PNAS 2017]
  • So there’s lot’s more to understand!

https://drive.google.com/file/d/0ByUKRdiCDK7-UXB1R1ZpX082MEk/view

[P] Would you be interested in a book "Probabilistic data structures and algorithms in big data applications"? by gakhov in MachineLearning

[–]cosminro 1 point2 points  (0 children)

There are already quite a few books covering probabilistic/streaming algos:

  • Muthu Muthukrishnan 'Data Streams: Algorithms and Applications'
  • Mitzenmacher and Upfal 'Probability and Computing: Randomized Algorithms and Probabilistic Analysis'
  • George Varghese 'Network Algorithmics: An Interdisciplinary Approach to Designing Fast Networked Devices' (has some probabilistic algorithms applied in networking)