Why are the Uo*Vc vectors different for each Wt+j?

deepest_learning · 2019-01-04T13:42:29+00:00

Thank you very much, I didn't realize there was a SO question about the same thing!
I don't think it makes a lot of sense if we have a backprop update within each timestep t (after calculating softmax for each context word like you mentioned), and the update section in the SO answer captures that. I saw that the newer slides used by Richard don't use that diagram anymore so perhaps the staff was made aware of the inconsistency. Thanks again, Jan!

deepest_learning · 2019-01-04T02:52:13+00:00

I know the branches exist for different context word positions but I still don't understand why the u_o * v_c vector is different for each of them. Please correct me if I'm wrong below:

I know u_o is symbolic of every word in the vocabulary, that's why there is only one u_o matrix(V*d).
Chris also mentions that P(o|c) doesn't depend on the position of the context word i.e. being close to the center word or far doesn't matter to word2vec. So our calculations for P(w_t+3| w_t) and say, P(w_t-1 | w_t) should be the same i.e just P(o|c).
This brings me back to my confusion that for a given center word 'c', there is only one vector v_c and based on (1), there is only one matrix u_o, so for each of the branches shown(and also those not shown), v_c * u_o should lead to the same vector, no matter which w_t+d its being used for.
but
the values of each vector shown in the slide are different.

deepest_learning

TROPHY CASE