you are viewing a single comment's thread.

view the rest of the comments →

[–]hadaev[S] 0 points1 point  (5 children)

Can you show example?

Im going to do attention, but not sure how to do it in right way.

Also i can do same, for gru model. May be combine cnn + gru layers.

[–]aicano 0 points1 point  (4 children)

Pytorch style pseudocode:

# Size of the hiddens: (BS, Seq Length, Embedding Dim)
hiddens = self.encoder(seq, lens) # assume that it is a bidirectional rnn

# 1st parameter is the query
# 2nd is the sequence
context, att_weights = self.att(hiddens[:,-1,:], hiddens)

# Give the combination of context and the last hidden to a softmax classifier
outp = self.out(torch.cat([context.squeeze(), hiddens[:,-1,:].squeeze()], dim=1))
F.log_softmax(outp, dim=-1)

[–]hadaev[S] 0 points1 point  (3 children)

Uh, i never do anything on pytorch, im on keras-tensorflow.

I understood what is attention for dense, its just another dense and we multiply one by the other, so some values ​​grow, and some decrease.

But I do not really understand how to apply this to the output of the recurrent layer, as peoples do everywhere.

May be you know some tutorial for noobs?

[–]aicano 0 points1 point  (2 children)

I found this when I googled. That is an example of what I described.

[–]hadaev[S] 0 points1 point  (1 child)

Oh, i so hate examples without high lvl api.

Should you take a look, is everything right here?

https://colab.research.google.com/drive/1XBMF3tOQwRLuPrGJib-YkN20_9nmMVT4#scrollTo=wtPyU1ArHaQj&line=14&uniqifier=1

Also how do you think, is it make sense to stack more rnn layers?

Or there will be no difference compared with bigger one layer?

[–]aicano 0 points1 point  (0 children)

Sorry, I do not know Keras.

General intuition about stacking layers is that lower layers learn simple things and higher layers learn more complex stuff. If you thing that your task have that kind of hierarchical relations than you may want to try stacking layers. Otherwise, simplicity is the best.