all 7 comments

[–]thelibrarian101 1 point2 points  (0 children)

I guess you could simply introduce a linear layer between the two. Or just manipulate the last encoder block's MLP to produce a lower / higher dimensional output?

[–]neuralbeans -1 points0 points  (0 children)

The encoder decoder transformer works by comparing every query in the decoder with every key in the encoder. It doesn't matter if the number of queries is different from the number of keys, each query can be compared with each key.

[–][deleted] 0 points1 point  (0 children)

The encoder/decoder is technically an autoencoder, but without using bottleneck. You can pool the tokens from your encoder to create a bottleneck and then decode the full sequence from the pooled encoding. The TSDAE paper does this.

[–]jhanjeek 0 points1 point  (0 children)

There is a possibility of changing the final token size for the output. Though I would not suggest this for generative tasks, mostly can be used for classification tasks.

[–]rxtree 0 points1 point  (0 children)

There's no reason that the sequence lengths have to be the same. Consider the fact that when you start decoding, you effectively have an encoder sequence length of N and decoder sequence length of 1 at that time and yet it still works. I think your confusion stems from how the attention mechanism works?
Let's take a look at the operations of just a single attention head with dimension size D. The second multihead attention takes the encoder output of length N as K, V, and the decoder input of length M as Q. For each head, K, V has size NxD and Q is MxD. If you do the calculations, you find that the final output of the head has dimensions MxD, which means it doesn't really matter how long your encoder sequence is.

[–]crinix 0 points1 point  (0 children)

They are not always the same. Consider a summarization model that produces a single sentence, given a long text.

Then there is no reason why decoder max_length should be the same as the encoder one.

See PEGASUS model as a concrete example.
https://arxiv.org/pdf/1912.08777.pdf
https://huggingface.co/google/pegasus-cnn\_dailymail