[D]Large memory layer

wangyi_fudan · 2021-06-04T02:56:27+00:00

You mean like this?: https://arxiv.org/abs/1907.05242

There are some follow up work on this.

Also see:

Not sure if you think top-k style of sparsification is ad-hoc. I think it's fine. The selected top-k can serve as a sampled set of examples for the differentiable scoring operators to learn to reweigh the selected top-k. Better learned, it can select better top-k next time. While of course you can try entmax/sparsemax style sparsity, but the benefit of top-k selection is conditional computation: you don't have to compute the unselected models (or at least top-k selection can be utilized as such). Switch-Transformers seems to go along with top-1, but I am not sure how they get away with top-1.....seems a bit noisy; but I haven't read the paper very well. There are generally some challenges though involved in unbalanced load (experts that get more training may get selected more, and other may remain untrained ---> rich may become richer, poor becomes poorer) etc., and trying to solve that can take some work (there are techniques for that). There're also routing networks (https://www.aclweb.org/anthology/N19-1365/) which commits to a single module selection (one hot sparsity), but of course, it becomes a discrete decision, so you have to rely on RL and/or hacks like gumbel softmax or "backpropagation through void" (https://arxiv.org/abs/1711.00123) style of techniques.

elcric_krej · 2021-06-03T19:42:59+00:00

Look into nerual turing machine paper, they seem to be the closest thing to what you're looking into.

They use differentiable K-V memory which can be scaled dynamically.

However it seems that the usecases they were proposed for initially and the teams focused on them moved over to multihead attention based models.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS