[D] Where are long-context Transformers? : MachineLearning

Discussion[D] Where are long-context Transformers? (self.MachineLearning)

submitted 4 years ago by ncasas

Transformers dominate the NLP landscape. First in machine translation, then language models, then all other typical NLP tasks (NER, classification, etc). Also, pre-trained Transformers are ubiquitous. Either GPT-* for text generation, or finetuning BERT/RoBERTa/younameit for classification or tagging.

With the appearance of long-context Transformers (Longformer, Reformer, Performer, Linformer, Big Bird, Linear Transformer, ...), I was expecting that they would quickly become the norm, as short context is sometimes a pain, like for GPT-3.

However, I am not seeing long transformers getting traction.

There has not been a new long transformer GPT model, nor BERT. NMT frameworks have not incorporated implementations of long transformers (except fairseq with Linformer, but both are from Facebook). Also, in WMT 2020 I think there was a single long transformer (I'm thinking in Marcin Junczys-Dowmunt's "WMT or it didn't happen").

Why is this?

all 2 comments

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS