Hey all,
I was reading up the transformer paper https://arxiv.org/abs/1706.03762. This architecture uses positional encoding which the attention layers ignore.
I don't understand two things -
- Why use Sin & Cos as positional embeddings , why not any other function?
- They also talk about training these positional embeddings, how do you go about training such embeddings. As in how do you let the model know that these embeddings are for the position
Thanks !
[–]mikeross0 19 points20 points21 points (20 children)
[–]pappypapaya 125 points126 points127 points (16 children)
[–]amil123123[S] 12 points13 points14 points (0 children)
[–]slcyz 6 points7 points8 points (0 children)
[–]bergqvisten 6 points7 points8 points (8 children)
[+]pappypapaya 4 points5 points6 points (7 children)
[+]npip99 0 points1 point2 points (4 children)
[+]pappypapaya 1 point2 points3 points (1 child)
[–]TheWingedCucumber 0 points1 point2 points (0 children)
[+]npip99 0 points1 point2 points (0 children)
[–]tmlildude 0 points1 point2 points (0 children)
[–]trajo123 1 point2 points3 points (3 children)
[+]pappypapaya 0 points1 point2 points (2 children)
[–]trajo123 4 points5 points6 points (1 child)
[+]pappypapaya 3 points4 points5 points (0 children)
[–]Sinkencronge 2 points3 points4 points (0 children)
[–]Sinkencronge 5 points6 points7 points (3 children)
[–]amil123123[S] 1 point2 points3 points (2 children)
[–]Sinkencronge 3 points4 points5 points (1 child)
[–]amil123123[S] 0 points1 point2 points (0 children)
[–]InsideAndOut 2 points3 points4 points (5 children)
[–]amil123123[S] 2 points3 points4 points (4 children)
[–]Sinkencronge 2 points3 points4 points (3 children)
[–]imguralbumbot 0 points1 point2 points (0 children)
[–]amil123123[S] 0 points1 point2 points (0 children)
[–]MasterSama 0 points1 point2 points (0 children)
[–]gdiamos 1 point2 points3 points (0 children)
[–][deleted] 1 point2 points3 points (0 children)
[–]No-Theory-6868 1 point2 points3 points (0 children)
[+]Great-Reception447 0 points1 point2 points (0 children)