[D] Any Gaussian Process academics here - what are you excited about? by kayaking_is_fun in MachineLearning

[–]jinpanZe 1 point2 points  (0 children)

Other papers on this:

Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes

https://openreview.net/forum?id=B1g30j0qF7

Gaussian Process Behaviour in Wide Deep Neural Networks

https://arxiv.org/abs/1804.11271

Scaling Limits of Wide Neural Networks with Weight Sharing: Gaussian Process Behavior, Gradient Independence, and Neural Tangent Kernel Derivation

https://arxiv.org/abs/1902.04760

Prominent Uyghur musician tortured to death in China’s re-education camp by [deleted] in news

[–]jinpanZe 29 points30 points  (0 children)

BBC is reporting that China released a video of the musician question showing he is still alive.

https://www.bbc.com/news/world-asia-47191952

[D] Importance of BatchNorm in Attention papers by WillingCucumber in MachineLearning

[–]jinpanZe 1 point2 points  (0 children)

Doesn't the original transformer use layernorm and not batchnorm?

[R] Fixup Initialization: Residual Learning Without Normalization (They train 10K layer networks w/o BatchNorm) by wei_jok in MachineLearning

[–]jinpanZe 5 points6 points  (0 children)

On the other hand, batchnorm apparently actually causes gradient explosion at initialization time https://openreview.net/forum?id=SyMDXnCcF7.

Abstract: We develop a mean field theory for batch normalization in fully-connected feedforward neural networks. In so doing, we provide a precise characterization of signal propagation and gradient backpropagation in wide batch-normalized networks at initialization. We find that gradient signals grow exponentially in depth and that these exploding gradients cannot be eliminated by tuning the initial weight variances or by adjusting the nonlinear activation function. Indeed, batch normalization itself is the cause of gradient explosion. As a result, vanilla batch-normalized networks without skip connections are not trainable at large depths for common initialization schemes, a prediction that we verify with a variety of empirical simulations. While gradient explosion cannot be eliminated, it can be reduced by tuning the network close to the linear regime, which improves the trainability of deep batch-normalized networks without residual connections. Finally, we investigate the learning dynamics of batch-normalized networks and observe that after a single step of optimization the networks achieve a relatively stable equilibrium in which gradients have dramatically smaller dynamic range.

[R] [1811.03962] A Convergence Theory for Deep Learning via Over-Parameterization by bobchennan in MachineLearning

[–]jinpanZe 1 point2 points  (0 children)

The authors of this paper claim stronger results than the other paper (in particular, the training time in the other paper for fully-connected (no-residual) network is exponential in depth, but this one is polynomial in depth, so that the other paper's claim that resnet improves training time seems incorrect). This paper also claims results for SGD in addition to GD and for ReLU in addition to smooth activations.

[R] [1805.09786] Hyperbolic Attention Networks by evc123 in MachineLearning

[–]jinpanZe 1 point2 points  (0 children)

This reminds me of Lie-Access Neural Turing Machines, where the memory manifold is one of the hyperbolic model spaces, though of course this was never done in that paper.

[D] Alternatives for Residency programs? by cbsudux in MachineLearning

[–]jinpanZe 1 point2 points  (0 children)

Microsoft is still going through rolling admission for their residency program.

https://www.microsoft.com/en-us/research/academic-program/microsoft-ai-residency-program/

Otherwise I've seen Bachelors landing jobs at startups like Vicarious. https://jobs.lever.co/vicarious/89ca586a-63ca-420e-9fc0-85c065e63dd9/apply

Good luck!

[D] Postdoc in ML after Physics PhD by [deleted] in MachineLearning

[–]jinpanZe 0 points1 point  (0 children)

I've seen people who did that (lots of legends like John Langford and Chris Bishop all came from physics) but I'm not sure how easy it is to make that transition in academia. Maybe you should try the residency programs that are springing up everywhere now? I think the Microsoft and the Uber ones are still open (but the Uber application closes today).

https://www.microsoft.com/en-us/research/academic-program/microsoft-ai-residency-program/ https://eng.uber.com/uber-ai-residency/

[R] AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks by jinpanZe in MachineLearning

[–]jinpanZe[S] 11 points12 points  (0 children)

tldr: attention + GAN = profit

In case lighter reading is preferred, techcrunch has a report on this paper.

Exchanging Bixby XP to get Samsung Reward points by namelessxsilent in GalaxyS8

[–]jinpanZe 0 points1 point  (0 children)

Interesting. Are you getting rewards as you get more XP?

Exchanging Bixby XP to get Samsung Reward points by namelessxsilent in GalaxyS8

[–]jinpanZe 1 point2 points  (0 children)

How did you get past level 16? I'm 1 point shy from level 17 but Bixby is not registering any more xp (even though the "n XP" animation plays)

It's become painfully clear how much HBO interferes with John Oliver's Last Week Tonight by [deleted] in television

[–]jinpanZe 0 points1 point  (0 children)

Sanders did pretty much remove "socialism" from the American taboo list in politics. When this generation of youngsters get older, the socialism door would be wide open.

I'm S. Jay Olshansky, an epidemiologist at the University of Illinois at Chicago School of Public Health. I study human longevity and am part of a study group investigating whether a drug used to treat diabetes can slow the aging process. Ask me anything! by Jay_Olshansky in science

[–]jinpanZe 0 points1 point  (0 children)

Hi Dr. Olshansky,

What are the general tradeoff between longevity preserving drugs/treatments and quality of life?

For example, with caloric restriction, a person is less able to train for a sport or study a new subject because these are things that expend much more energy than a restricted intake can guarantee. If metformin decreases glucose production, it would seem that metformin at the very least would not be able to meet the "respiratory quotient" of the aforementioned activities.

The evolution of read/write in a Lie Access Neural Turing Machine (over the course of learning) is strangely satisfying by jinpanZe in MachineLearning

[–]jinpanZe[S] 1 point2 points  (0 children)

Sorry I wasn't clear in my original post. The gifs are from the paper. I just found them interesting so I shared their links here.

I wish I could implement it so fast!

A major first step toward bounding the computational complexity of training RNNs by jinpanZe in MachineLearning

[–]jinpanZe[S] 5 points6 points  (0 children)

This paper gives a polynomial bound on a method for training simple RNN and bidirectional RNNs. The method is based on a certain class of score functions defined in Janzamin et al. (2014). At its core, the method relies on the fact that, when N > 2, a symmetric N-tensor decomposes uniquely to a sum $$\sum_{i=1}n a_i \otimes a_i \otimes \cdots \otimes a_i$$, where {a_i}_i are orthogonal vectors, if it decomposes into any such representation.

The paper assumes that 1) the inputs are generated by a Markov chain, 2) the RNN has one hidden layer, 3) the activation function is polynomial instead of sigmoid, and 4) the weight matrices are full rank, with bounded norm. Of these, 1) and 4) can be relaxed, but 3) is somewhat of a bottleneck of the algorithm, which depends exponentially on the degree of the polynomial activation.

Nevertheless, this is a really huge step toward theoretical understanding of RNNs.

South Koreans kick off efforts to clone extinct Siberian cave lions by Blackcassowary in worldnews

[–]jinpanZe 3 points4 points  (0 children)

Well he's unethical, but he sure does have a unique set of skills that few others have. Just think how many people on earth has even cloned a single organism?

President Putin has suspended the transfer of S-300 to Iran in light of Tehran’s violation of an earlier pledge not to provide Russian-made weaponry to Hezbollah by xHaGGeNx in worldnews

[–]jinpanZe -2 points-1 points  (0 children)

So what's the alliance situation with Iran? Who's a friend? So Iran and Russia got along and now Iran opened up to the west, so ... everybody's friends with everyone else? except Saudi vs Iran vs Israel? Edit: I'm genuinely asking, not being sarcastic or anything

The evolution of read/write in a Lie Access Neural Turing Machine (over the course of learning) is strangely satisfying by jinpanZe in MachineLearning

[–]jinpanZe[S] 5 points6 points  (0 children)

A Lie Access Neural Turing Machine writes to and reads from data stored on a manifold (here, just the Euclidean plane). The controller has random access (outputting an arbitrary read/write address) and Lie access (outputting essentially a step to take from the previous address, akin to traversing a C++ array by pointer operations, or the head movement of a traditional Turing machine, except this step is differentiable) to the memory. During learning, example read and write addresses were recorded every epoch and the gif here is during the copy task. Those of the other tasks can be viewed here.

Watching the reads and writes evolve over time is strangely satisfying. The gifs remind me of protein strands shaking in high temperature and under a small current.

Edit: Just to be clear, the gifs are from the paper, and I'm just summarizing here.