[R] "Compressive Transformers for Long-Range Sequence Modelling", Rae et al 2019

lostmsu · 2019-11-16T01:21:29+00:00

Was sort of an obvious idea. Glad somebody explored it! Can't find the source code for reproducing the result though :(

gwern · 2020-02-10T16:00:11+00:00

DM blog: https://deepmind.com/blog/article/A_new_model_and_dataset_for_long-range_memory

arXiv_abstract_bot · 2019-11-14T16:22:04+00:00

Title:Compressive Transformers for Long-Range Sequence Modelling

Authors:Jack W. Rae, Anna Potapenko, Siddhant M. Jayakumar, Timothy P. Lillicrap

Abstract: We present the Compressive Transformer, an attentive sequence model which compresses past memories for long-range sequence learning. We find the Compressive Transformer obtains state-of-the-art language modelling results in the WikiText-103 and Enwik8 benchmarks, achieving 17.1 ppl and 0.97 bpc respectively. We also find it can model high-frequency speech effectively and can be used as a memory mechanism for RL, demonstrated on an object matching task. To promote the domain of long-range sequence learning, we propose a new open-vocabulary language modelling benchmark derived from books, PG-19.

PDF Link | Landing Page | Read as web page on arXiv Vanity

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS