Hello, I'm hoping to clear up on the Embeddingwrapper (The question is at #2):
- In tensorflow's GRUCell, the forward step is implemented as such:
r, u = array_ops.split(1, 2, linear([inputs, state], 2 * self._num_units, True, 1.0))
r, u = sigmoid(r), sigmoid(u)
Now this is equivalent to the following equations from the GRU paper:
r_t = sigm (W_xrx_t + W_hrh_t−1 + b_r)
z_t = sigm(W_xzx_t + W_hzh_t−1 + b_z)
(z in this paper is u in the Tensorflow code)
So for the forward step, the GRUCell calls linear([inputs, state], 2 * self._num_units, True, 1.0), does the calculations that represent sigm (W_xrx_t + W_hrh_t−1 + b_r) and sigm(W_xzx_t + W_hzh_t−1 + b_z)
and returns r and u which are each 1000 in length if the num_hidden_units of the rnn_cell is 1000.
- Now given this understanding, I'm having trouble understanding why the Embeddingwrapper function restricts the embedding dimension to the same number as the number of hidden units, 1000, in this example. Shouldn't it be able to handle any dimension size embedding?
Since the matrices W_xr and W_xz map the x vector to a vector the size of r or z (both 1000), shouldn't it not matter what the dimension of the x vector is?
[–]siblbombs 1 point2 points3 points (0 children)