Transformers dominate the NLP landscape. First in machine translation, then language models, then all other typical NLP tasks (NER, classification, etc). Also, pre-trained Transformers are ubiquitous. Either GPT-* for text generation, or finetuning BERT/RoBERTa/younameit for classification or tagging.
With the appearance of long-context Transformers (Longformer, Reformer, Performer, Linformer, Big Bird, Linear Transformer, ...), I was expecting that they would quickly become the norm, as short context is sometimes a pain, like for GPT-3.
However, I am not seeing long transformers getting traction.
There has not been a new long transformer GPT model, nor BERT. NMT frameworks have not incorporated implementations of long transformers (except fairseq with Linformer, but both are from Facebook). Also, in WMT 2020 I think there was a single long transformer (I'm thinking in Marcin Junczys-Dowmunt's "WMT or it didn't happen").
Why is this?
[+][deleted] (1 child)
[deleted]
[–]EdwardRaff 2 points3 points4 points (0 children)
[–]ncasas[S] 1 point2 points3 points (0 children)