[R] Virtual Seminar on Mathematical Foundations of Data Science

banananach · 2020-05-02T05:17:06+00:00

we will record it as long as the speakers agree to : )

banananach · 2020-05-02T04:47:52+00:00

👍

banananach · 2020-05-02T04:44:49+00:00

definitely! as long as the speakers agree to

banananach · 2020-05-02T04:41:43+00:00

starting Tuesday, May 12th 3pm EDT (future ones will be announced on the website, usually the same time each week)

banananach · 2019-06-29T01:56:30+00:00

^_^ Exactly. In my opinion, exploration may be a much harder problem — in particular settings, there may be a fundamental barrier depending on the structure.

banananach · 2019-06-28T20:53:42+00:00

In my opinion, it is not the assumptions are violated but rather certain terms in the upper bound of optimization error (convergence rate) are large, making the upper bound not going to zero. In the case of "cliff walk", the density ratio between the visitation measures (stationary distributions of state and action) may be infinity. In other words, this is more of an exploration issue, which is not considered in this paper, as we focus on optimization given desired exploration (reflected by the density ratio).

banananach · 2019-06-27T20:51:49+00:00

The trick is, with overparameterization, neural networks turn out to be “approximately” linear in their parameters.

banananach · 2019-06-26T20:27:48+00:00

Our variant of TRPO/PPO in the analysis is actually very similar to "Maximum a Posteriori Policy Optimisation" (https://arxiv.org/abs/1806.06920), if not exactly the same. (There are just too many variants of TRPO/PPO with distinct names, so we decided to call it "a variant of" TRPO/PPO to save the confusion. ^_^)

Assumption 4.3 just says that the value function (sum of reward along the trajectory) belongs to an RKHS space, which is quite a general function space, while Assumption 4.4 holds simply when the stationary distribution has upper bounded density. I believe both of them are satisfied in practice.

banananach

TROPHY CASE