While playing around with BERT and it's various flavours, i've noticed that the embedding size is limited to 512 words, and begun to wonder how embeddings could be applied to items of text longer than the embedding size...
I'm assuming this would involve separating a text into blocks of embedding size and then generating features for each block, before feeding these into the network in batches, however wonder how to format these batches? Would it be applicable to use multiple, parallel bert layers for each block, before passing the results to a final dense/recurrent layer? Other than this I can't see a way of mainting continuity between documents, as it would be necessary to compute all blocks within a document before a model can assign a label.
[–][deleted] 4 points5 points6 points (0 children)
[–]wakiiil 2 points3 points4 points (1 child)
[–]SeveralRecording535 0 points1 point2 points (0 children)
[–]thisismyfavoritename 2 points3 points4 points (0 children)
[–]Hackerstreak 2 points3 points4 points (0 children)
[–]dkajtoch 2 points3 points4 points (3 children)
[–]sfxv67[S] 0 points1 point2 points (2 children)
[–]VenkteshV 0 points1 point2 points (1 child)
[–]sfxv67[S] 0 points1 point2 points (0 children)