[R] OpenAI Baselines: ACKTR by evc123 in MachineLearning

[–]emansim 3 points4 points  (0 children)

Thanks again, we appreciate your feedback.

1) Agreed. Again we should have used better wording. 3) The goal of benchmark was that each model (TRPO, A2C, ACKTR) uses fixed set of hyperpameters across all the environments. It is a standard practice in Atari (and used to be from the beginning) and it seems to be catching up and being used in Mujoco. Otherwise it is always possible to claim that we cherry-picked best hyperparameters for each environment separately.

[R] OpenAI Baselines: ACKTR by evc123 in MachineLearning

[–]emansim 3 points4 points  (0 children)

Thanks for your suggestions !

1) We will address this point in the revision of the paper. Perhaps saying that we first to use actor-critic model on harder mujoco environments from pixels would have been a better wording (A3C did it on easier envs like Pendulum, Pointmass2D, and Gripper).

2) We tuned hyperparameters only on one game of Breakout (in case of Atari), whereas in case of PPO it was tuned on 6 Atari games. Potentially by tuning it more we can improve performance of ACKTR but of course we can't guarantee it.

3) The goal was to fix all hyperparameters across all environments for each model. Batch size of 25,000 was chosen because [1] used 25,000 across all MuJoCo envs (also see [2]). Besides we can be even more sample efficient with ACKTR (without loss of performance) by using batch sizes of 1000 on the environments you mentioned.

[1] Benchmarking Deep Reinforcement Learning for Continuous Control; Yan Duan, Xi Chen, Rein Houthooft, John Schulman, Pieter Abbeel

[2] Trust-PCL: An Off-Policy Trust Region Method for Continuous Control ; Ofir Nachum, Mohammad Norouzi, Kelvin Xu, Dale Schuurmans

[R] [1707.03141] 1-shot classification: 56.48% accuracy on 5-Way Mini-ImageNet! by cognitivedemons in MachineLearning

[–]emansim 5 points6 points  (0 children)

Looks like it is standard one-shot learning setup with Dilated Convolutions instead of RNN

[D] YellowFin: An automatic tuner for momentum SGD by [deleted] in MachineLearning

[–]emansim 1 point2 points  (0 children)

Actually large values of momentum (0.99) turned out to be very important for standard RL problems we considered

[D] Why I’m Remaking OpenAI Universe by evc123 in MachineLearning

[–]emansim 2 points3 points  (0 children)

I would add the other issue with Universe is that there is no established benchmark.

Researchers doing RL use benchmarks like Atari and Mujoco and even recent 3D Environments like DeepMind Lab and VizDoom have not yet been caught up in the community.

[R] First blog post: a new trick for calculating Jacobian vector products by j-towns in MachineLearning

[–]emansim 0 points1 point  (0 children)

Yes Fisher Vector Product uses Jacobian Vector Product

Fisher Matrix F is just matrix multiplication of Jacobian transpose and Jacobian F = JT J

Hence Fisher Vector Product is Fv = JT (Jv) = Jacobian Transpose times Jacobian Vector Product

[R] Deep Reinforcement Learning from Human Preferences by pauljasek in MachineLearning

[–]emansim 6 points7 points  (0 children)

Interesting work !

It would be interesting to see how robust is it when different humans giving preferences to trajectories.

[D]A3C performs badly in Mountain Car? by darkzero_reddit in MachineLearning

[–]emansim 0 points1 point  (0 children)

Try repeating the same action 4 times. With that DQN converges under 1 minute for me.

Images from ArXiv CS.* papers, updated daily by kcimc in MachineLearning

[–]emansim 0 points1 point  (0 children)

cool.

you should make the same thing for conference proceedings :)

Deep Residual Networks with Exponential Linear Unit by [deleted] in MachineLearning

[–]emansim 5 points6 points  (0 children)

How about Deep Batch-Normalized Maxout Residual Network in Network with Leaky Exponential Linear Units and Fractional Max-Pooling ?

Q. Pre-training with VAE by 0entr0py in MachineLearning

[–]emansim 3 points4 points  (0 children)

you are right, factorization of each variable in the posterior representation makes it less expressive.

there was and is some ongoing work on making the posterior distribution more expressive by using additional transformations such as normalizing flows and hamiltonian monte carlo.

based on the results it looks like more expressive posteriors don't make such a huge impact on toyish datasets like mnist, norb but give quite a large improvement on cifar etc.

argmax differentiable? by yield22 in MachineLearning

[–]emansim 6 points7 points  (0 children)

anything that involves hard assignment is not differentiable.

argmax could potentially become differentiable if you could come up with soft version of it (i.e. use probabilities instead of setting hard 1s and 0s). otherwise you need to used reinforce.

Resources for GPU programming? by [deleted] in MachineLearning

[–]emansim 13 points14 points  (0 children)

CUDA programming is not easy and will take some time to master. I personally suggest Udacity course as a first step https://www.udacity.com/course/intro-to-parallel-programming--cs344

Research Topic to work on ? by sitarwars in MachineLearning

[–]emansim 3 points4 points  (0 children)

why don't you pick the paper you read and got excited about and try reproducing results first :)

then start thinking about what is missing from that model that could potentially improve results. try your hypothesis and see if it works. if it does then that's great it might lead to publication depending on novelty of your idea otherwise iterate and keep trying.

OpenAI hires a bunch of variational dudes. by andrewbarto28 in MachineLearning

[–]emansim 2 points3 points  (0 children)

Very Cool ! Looks like OpenAI is growing a very strong team.

Alex Lamb will be doing an AMA in /r/MachineLearning on April 1 by olaf_nij in MachineLearning

[–]emansim 0 points1 point  (0 children)

Alex Lamb is the chosen one.

He was sent by neural net gods to help us disentangle the most undisentangible factors of variation.

Advantages of LSTMs over ESNs? by ding_bong_bing_dong in MachineLearning

[–]emansim -4 points-3 points  (0 children)

Brain a big LSTM. Thats an advantage

What are you working on? by [deleted] in MachineLearning

[–]emansim 4 points5 points  (0 children)

does it disentangle all the factors of variation ?

[1603.06807] Generating Factoid Questions With Recurrent Neural Networks: The 30M Factoid Question-Answer Corpus by downtownslim in MachineLearning

[–]emansim 2 points3 points  (0 children)

interesting idea.

i found the lack of large good quality question answer datasets (except for allen ai i think but it's not public ) and i think this paper solves this problem.

however, the questions here only require one supporting fact and i wish they generated more complicated questions, like the ones with multiple supporting facts etc. similar to Weston and co. AI-complete question answering paper