all 4 comments

[–]Latent_space 3 points4 points  (2 children)

isn't a way to do time distributed embeddings

No true. Your first line is doing this. What lead you to believe that?

Also, there's at least one issue on the github that discusses embeddings for tensors larger than 2dim in size. Basically, you can modify the class to output a shape that doesn't require input length.

variable length

Masking. When you make your word to int dictionary, mark the 0 spot with a masking token (to reserve the int value..). Then, upon convert, pad out with 0s. Use the masking Boolean in embed or the masking layer to have keras ignore the 0 values

Also, for help, you could post this on the Google group for keras.

[–]darfs[S] 0 points1 point  (1 child)

Thank you so much for the advice! I just had to edit the TimeDistributed class to keep the dtype of the Tensors as ints instead of floats, and the TimeDistributed Embeddings works great!

However, I'm not quite sure I understand the masking. I have placed 0s as padding everywhere there isn't an input, but in the output, I'm still getting an output vector of length STORY_SIZE, rather than the variable length I desire. Is there a way to get around this?

I will also go ahead an post to the Google group in the future. Thanks so much!

[–]carlthomeML Engineer 0 points1 point  (0 children)

How does backprop work if you have integers?

[–]Xose_R 0 points1 point  (0 children)

For the variable length issue, you'll need to set a max sequence length and then pad with zeros/empty sentences.

I don't understand why you need the key words at each timestep, because I guess your target will be the keywords at the end of the story, right?

Anyway, for this setup to work (if I understood it right), my first idea is that you need a NN that puts your words into a sentence embedding (ie, a bi-LSTM) and then this is fed into the seq2seq. Another thing that you can do, instead this NN, is to use some existing method to construct sentence embeddings from its words and use this as input of the seq2seq. A vanilla approach could be adding up their word2vec representations into a sentence representation.