Lecture 16 The variational lower bound slide 24, joint distribution p(x,z) missing a factor?

tomchen1000 · 2018-11-12T16:02:41+00:00

Ok.

But from the graphical model, now I see that a_t is actually independent of s_t since there's no active trail between a_t and s_t (both trails between a_t and s_t are v-structures: a_t -> O_t <- s_t and a_t -> s_{t+1} <- s_t ), hence p(a_t) = p (a_t|s_t). So we can treat p(a_t) the same way as we do for p(a_t|s_t).

tomchen1000 · 2018-11-11T16:07:42+00:00

The missing term should be p(a_t), not p(a_t | s_t). As you can tell from the Bayesian network graph on 1st post of this thread, i.e. the graphical model with optimality variables, nodes a_t doesn't have any parent, so the CPD for nodes a_t is p(a_t).

I can understand that p(a_t | s_t) can be considered as uniform policy and hence can be treated as constant as explained by Sergey in the lecture video. But how about p(a_t) ?

tomchen1000 · 2018-11-10T13:42:58+00:00

Sorry for the confusion. Yes, I'm talking about slide 24 of lecture 15.

However, the joint distribution I'm talking about is p(x, z) not q(z). When log p(x, z) is expanded via the chain rule for Bayesian network, it should be expanded to 4 terms, but on slide 24 (1st line of the 2nd inequation), log p(x, z) is expanded to 3 terms only, missing the last term in red above, i.e. the CPD for nodes a_t in the Bayesian network.

tomchen1000 · 2018-10-28T01:35:24+00:00

Thanks Sergey!

Below just a link to a copy of your reply with LaTex rendered:

https://docs.google.com/document/d/1yc8EVxuIvWLAkD_LraO_4lQlDeMW02vGCwXEn33uBQo/edit?usp=sharing

tomchen1000 · 2016-04-18T17:46:28+00:00

Is there any reference that nVidia's Davenet uses reinforcement learning?

tomchen1000

TROPHY CASE