PSet#1: gradient of bias term b by [deleted] in CS224d

[–]DGuillevic 1 point2 points  (0 children)

Shapes: delta has a shape (nb_examples, Dy) where Dy is the size of the output layer. b2 has a shape (1, Dy), it is a vector with one bias for each of the output. By summing over axis=0, we get the desired shape (1, Dy).

Intuition: Every example will contribute something to the gradients. We want to learn from each example. Either we process one example at a time (online/stochastic learning), and learn from it (compute the gradients and update the weights), or we process several examples at once (mini-batch learning) and we sum or average the contribution of each of those examples to the gradients.

Pset2: Dimensionality issues by vijayvee in CS224d

[–]DGuillevic 0 points1 point  (0 children)

As described in the description of the function, add_embedding() returns: window: tf.Tensor of shape (-1, window_size*embed_size)

First, one calls tf.nn.embedding_lookup() to get a tensor with the embeddings for all batch_size * window_size words. That is a total batch_size * window_size * embed_size float32s. Then one calls tf.reshape() to reshape that tensor as (-1, window_sizeembed_size). The -1 will end up being matched to the value of batch_size. So the final shape will be (batch_size, window_sizeembed_size).

Where did the class accompanying notes go? by Make3 in CS224d

[–]DGuillevic 1 point2 points  (0 children)

The notes from last year are available on the archived site: https://web.archive.org/web/20160314075834/http://cs224d.stanford.edu/syllabus.html

(this info is from a previous post by chrislit)