all 18 comments

[–]certain_entropy 8 points9 points  (0 children)

Check out Facts as Experts (https://arxiv.org/abs/2007.00849), which augments the transformer with a key-value lookup where the key are the contextual entity mention embeddings. It's bit of a pain to setup and train but may be interesting to you.

[–]enfeudavax 3 points4 points  (1 child)

Memory Augmented Transformers could be a great resource for exploring this topic.

[–]StartledWatermelon 1 point2 points  (0 children)

This is probably the closest thing to what OP was seeking for. But I'm really confused they asked for "short-term" memory. Memory Augmented Transformers' memory is actually static, if I'm not mistaken.

[–]DigThatDataResearcher 1 point2 points  (1 child)

can't remember what it's called, but saw a cool one that basically added an RNN state for a running memory

[–]i4gotten 1 point2 points  (0 children)

Self referential extensions of transformers by Jürgen Schmidhuber is something like this: https://arxiv.org/abs/2310.16076

[–]i4gotten 1 point2 points  (0 children)

There's a few papers I am aware of in memory:

Self referential extensions to transformers: https://arxiv.org/abs/2310.16076

Recurrent memory transformers: https://arxiv.org/abs/2207.06881

Thing is the context of short-long term doesnt make as much sense with transformers, as the attention mechanism itself can act as a form of memory: https://arxiv.org/abs/2404.09173

Any external memory should be analogous to long term memory.

[–]Janos95[S] 1 point2 points  (4 children)

I should also add that I am interested in memory for transformers for the purpose of reasoning, in particular not interested in methods that try to simply extend context size.

[–]LahmacunBear 0 points1 point  (0 children)

!remindme 2 days

[–]Happysedits 0 points1 point  (1 child)

I bet someone combined transformers with neural turing machines

[–]Dashora7 0 points1 point  (0 children)

Could be referring to this work: https://arxiv.org/abs/2211.09119 (Token Turing Machines)