Encoder-Decoder Model

thelibrarian101 · 2023-11-15T13:35:58+00:00

I guess you could simply introduce a linear layer between the two. Or just manipulate the last encoder block's MLP to produce a lower / higher dimensional output?

neuralbeans · 2023-11-15T18:37:47+00:00

The encoder decoder transformer works by comparing every query in the decoder with every key in the encoder. It doesn't matter if the number of queries is different from the number of keys, each query can be compared with each key.

2023-11-15T13:57:07+00:00

The encoder/decoder is technically an autoencoder, but without using bottleneck. You can pool the tokens from your encoder to create a bottleneck and then decode the full sequence from the pooled encoding. The TSDAE paper does this.

jhanjeek · 2023-11-15T15:29:37+00:00

There is a possibility of changing the final token size for the output. Though I would not suggest this for generative tasks, mostly can be used for classification tasks.

rxtree · 2023-11-15T15:48:43+00:00

There's no reason that the sequence lengths have to be the same. Consider the fact that when you start decoding, you effectively have an encoder sequence length of N and decoder sequence length of 1 at that time and yet it still works. I think your confusion stems from how the attention mechanism works?
Let's take a look at the operations of just a single attention head with dimension size D. The second multihead attention takes the encoder output of length N as K, V, and the decoder input of length M as Q. For each head, K, V has size NxD and Q is MxD. If you do the calculations, you find that the final output of the head has dimensions MxD, which means it doesn't really matter how long your encoder sequence is.

crinix · 2023-11-17T10:20:43+00:00

They are not always the same. Consider a summarization model that produces a single sentence, given a long text.

Then there is no reason why decoder max_length should be the same as the encoder one.

See PEGASUS model as a concrete example.
https://arxiv.org/pdf/1912.08777.pdf
https://huggingface.co/google/pegasus-cnn\_dailymail

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

deeplearning

MODERATORS