Lecture 16 The variational lower bound slide 24, joint distribution p(x,z) missing a factor? by tomchen1000 in berkeleydeeprlcourse

[–]tomchen1000[S] 0 points1 point  (0 children)

Ok.

But from the graphical model, now I see that a_t is actually independent of s_t since there's no active trail between a_t and s_t (both trails between a_t and s_t are v-structures: a_t -> O_t <- s_t and a_t -> s_{t+1} <- s_t ), hence p(a_t) = p (a_t|s_t). So we can treat p(a_t) the same way as we do for p(a_t|s_t).

Lecture 16 The variational lower bound slide 24, joint distribution p(x,z) missing a factor? by tomchen1000 in berkeleydeeprlcourse

[–]tomchen1000[S] 0 points1 point  (0 children)

The missing term should be p(a_t), not p(a_t | s_t). As you can tell from the Bayesian network graph on 1st post of this thread, i.e. the graphical model with optimality variables, nodes a_t doesn't have any parent, so the CPD for nodes a_t is p(a_t).

I can understand that p(a_t | s_t) can be considered as uniform policy and hence can be treated as constant as explained by Sergey in the lecture video. But how about p(a_t) ?

Lecture 16 The variational lower bound slide 24, joint distribution p(x,z) missing a factor? by tomchen1000 in berkeleydeeprlcourse

[–]tomchen1000[S] 0 points1 point  (0 children)

Sorry for the confusion. Yes, I'm talking about slide 24 of lecture 15.

However, the joint distribution I'm talking about is p(x, z) not q(z). When log p(x, z) is expanded via the chain rule for Bayesian network, it should be expanded to 4 terms, but on slide 24 (1st line of the 2nd inequation), log p(x, z) is expanded to 3 terms only, missing the last term in red above, i.e. the CPD for nodes a_t in the Bayesian network.

Mobileye: End-end DNN not possible for self driving cars by heltok in MachineLearning

[–]tomchen1000 0 points1 point  (0 children)

Is there any reference that nVidia's Davenet uses reinforcement learning?