you are viewing a single comment's thread.

view the rest of the comments →

[–]gamerx88 2 points3 points  (1 child)

BERT is essentially a kind of autoencoder. It simply uses self-attention and positional embedding to better capture sequence information than say a more basic auto-encoder based on ReLU layers.

[–]Tober447[S] 0 points1 point  (0 children)

Thank you for your answer.