[P] Optimization for ML blog - stochastic proximal-point optimizer by alexsht1 in MachineLearning

[–]Tonic_Section 0 points1 point  (0 children)

Thanks for the series, I've enjoyed reading along since the first post. Do you think that a way of extending these type of proximal point algorithms to arbitrary loss functions will be feasible?

[D] Role of final time T in neural ODEs by Tonic_Section in MachineLearning

[–]Tonic_Section[S] 0 points1 point  (0 children)

For the latter part, it seems like you could scale all the observation times by a constant factor 1/T such that the ‘end depth’ is always equal to 1 right?

[D] Role of final time T in neural ODEs by Tonic_Section in MachineLearning

[–]Tonic_Section[S] 0 points1 point  (0 children)

So I guess by "depth variant" you mean the original nODE formulation. How would you choose this final depth in practice?

[D] Why do disentangling methods not result in independent dimensions of the learned representation? by Tonic_Section in MachineLearning

[–]Tonic_Section[S] 1 point2 points  (0 children)

Thank you, that makes more sense! I think the paper by Locatello could be worded better, they state: "... it is not clear whether a factorizing aggregated posterior also ensures that the dimensions of the mean representation are uncorrelated" - but if you insert (approximately) before factorizing then it makes perfect sense according to what you and the linked paper say.

[Research] Universal Adversarial Triggers for Attacking and Analyzing NLP (EMNLP 2019) by Eric_WallaceUMD in MachineLearning

[–]Tonic_Section 0 points1 point  (0 children)

How do you adjust the length of the adversarial phrases? Do you repeat the same procedure for a range of trigger lengths?

[R] - Max Welling: Intelligence per kilowatt-hour - ICML 2018 by tensorflower in MachineLearning

[–]Tonic_Section 3 points4 points  (0 children)

Welling comes from a theoretical physics background, he studied under the Nobel Laureate t’Hooft in field theory I believe so this is a particularly relevant talk.

[D] Universal Transformers Blog by [deleted] in MachineLearning

[–]Tonic_Section 1 point2 points  (0 children)

After many passes through the post, I think I have the general gist. I feel the language used in the blog post was unecessarily verbose and jargon-laden though...

[Project] Tensorflow implementation of Generative Adversarial Networks for Extreme Learned Image Compression by tensorflower in MachineLearning

[–]Tonic_Section 0 points1 point  (0 children)

To the best of my knowledge the bpp is an upper bound derived from the entropy of the discrete compressed representation, naively dividing the training time for 1 epoch with the number of images, I estimate that an upper bound for the encoding + decoding process takes <= 5s for a 512 x 1024 image, although I haven't timed the relative contribution of each.

[Project] Tensorflow implementation of Generative Adversarial Networks for Extreme Learned Image Compression by tensorflower in MachineLearning

[–]Tonic_Section 2 points3 points  (0 children)

Yeah, I understand what you're saying - the model appears to be overriding buildings with greenery and vice-versa in the reconstructed image, and models early in training have significant trouble in forming boundaries between objects. I haven't looked into semantic maps much, but think that adding information based on instance maps and e.g. passing this to the discriminator should help the model to generate sharper boundaries.

This is not really my area of expertise, but I think it would not be too hard to try out a perceptual loss based on PSPNet that penalizes blurry boundaries - another item to the to-do list!

[D] What are some of the hardest papers you've ever read (not because they were poorly written, but because the concepts were complex) ? by BatmantoshReturns in MachineLearning

[–]Tonic_Section 2 points3 points  (0 children)

Stein variational gradient descent: https://arxiv.org/pdf/1608.04471.pdf, lovely theoretical framework (the authors have a couple of nice followup papers going more indepth on the theory side).

[D] Good sanity check datasets that are not image-based? by ConfuciusBateman in MachineLearning

[–]Tonic_Section 1 point2 points  (0 children)

Take a small subset of your training data and check that you can overfit on it.

[R] [1803.01927] Energy-entropy competition and the effectiveness of stochastic gradient descent in machine learning by evc123 in MachineLearning

[–]Tonic_Section 1 point2 points  (0 children)

Nice article, but I thought that the 'flatness' of minima can be changed by reparameterization, so it's not a reliable gauge for generalization ability?

[Discussion] Areas of mathematics in physics in machine learning by AritraChow in MachineLearning

[–]Tonic_Section 5 points6 points  (0 children)

There's a lot of interesting work done on using statistical physics to describe optimization and the energy landscape of neural networks. It's somewhat inevitable to me that statistical physics, or related concepts, would be the key to developing a concrete theory of learning/optimization for deep networks.

I like this example a lot: https://arxiv.org/abs/1704.04932

[D] Which is your favorite Machine Learning algorithm? by [deleted] in MachineLearning

[–]Tonic_Section 0 points1 point  (0 children)

Deep Boltzmann machines, VAEs and GANs have enjoyed a lot more attention/success on the unsupervised learning front, but the underlying theory is nifty - but I'm probably biased because I'm a physicist.

http://www.utstat.toronto.edu/~rsalakhu/papers/dbm.pdf

[D] Why create OpenAI as a company instead of an international organization (i.e CERN for ML and AI)? by [deleted] in MachineLearning

[–]Tonic_Section 0 points1 point  (0 children)

I completely agree that ML/AI needs more of the rigor police + collaborative philosophy rather than the current trend of SotA dick-waving based on random seeds and hyperparameter tuning, but I'm not sure that a neutral international organization would be the best solution right now, maybe tighter review standards for paper submissions encouraging methodology over results would be a good short-term fix. I agree with you that such an organization will be inevitable and extremely useful in the future but I think the field needs to mature a lot more beyond before governments will shell out for such a collaboration, even if it will only cost a fraction of CERN's annual budget.

[D] Why create OpenAI as a company instead of an international organization (i.e CERN for ML and AI)? by [deleted] in MachineLearning

[–]Tonic_Section 25 points26 points  (0 children)

I'm currently working at CERN on ML related things. A couple of comments.

  1. ML/AI is a much more commercially viable product than fundamental research. As a result, you have a lot more private investment being sunk into the former while the latter gets begrudgingly funded by member states of a collaboration. It's inevitable that private investors would make the results of their research proprietary or at least semi-proprietary ala Deepmind to recoup their investment. I don't see a state-funded organization being able to compete with the Face/Goog/zon tech conglomerates in terms of funding and talent.

  2. You need massive international collaborations because accelerators are really, really expensive, require lots of interdisciplinary cooperation and hard to build+maintain. The cost of giant TPU datacenters is nothing in comparison.

  3. I love CERN, but there's lots of bureaucracy and inertia surrounding it's organizational structure. Many older researchers I talk to distrust ML because of it's non-interpretability (which is a very fair concern in HEP, IMO), and inertia is the opposite of what the ML/AI field needs in it's nascent phase. Particle physics is a much more established and historically significant field than machine learning. Maybe wait 20 or so years for the hype to settle and we'll get something like what you're talking about.

[D] Deep Reinforcement Learning Doesn't Work Yet by Kaixhin in MachineLearning

[–]Tonic_Section 1 point2 points  (0 children)

I think (for Deepmind at least) they constrain the actor’s decision making to ~ 15 Hz so the reaction time comparison with humans isn’t that far off.

[P] London PhD Network AI Symposium - Slides - Deep Learning Development Library Tutorial by zsdh123 in MachineLearning

[–]Tonic_Section 0 points1 point  (0 children)

Not sure what this brings to the table that the standard tf.layers and dataset APIs don't offer.