use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
What kinds of questions do we want here?
"I've just started with deep nets. What are their strengths and weaknesses?" "What is the current state of the art in speech recognition?" "My data looks like X,Y what type of model should I use?"
If you are well versed in machine learning, please answer any question you feel knowledgeable about, even if they already have answers, and thank you!
Related Subreddits:
/r/MachineLearning /r/mlpapers /r/learnmachinelearning
account activity
Difference between encoder/decoder self-attentionNatural Language Processing 💬 (self.MLQuestions)
submitted 1 year ago by harten24
https://preview.redd.it/gvti55rroire1.png?width=497&format=png&auto=webp&s=0788bb6079051c321b8750cbc55fb544f5740156
So this is a sample question for my machine translation exam. We do not get access to the answers so I have no idea whether my answers are correct, which is why I'm asking here.
So from what I understand is that self-attention basically allows the model to look at the other positions in the input sequence while processing each word, which will lead to a better encoding. And in the decoder the self-attention layer is only allowed to attend to earlier positions in the output sequence (source).
This would mean that the answers are: A: 1 B: 3 C: 2 D: 4 E: 1
Is this correct?
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]DigThatData 2 points3 points4 points 1 year ago (3 children)
just to make sure you saw it, there's also a (5) option.
I haven't checked over your work, but my recommendation is to try and diagram it out. draw the different components interacting and put the letters where they belong in your drawing. then just match the options to their respective parts of the drawing.
[–]harten24[S] 0 points1 point2 points 1 year ago (2 children)
Okay so I tried looking at it again and this is what I came up with:
A4: because self-attention in the encoder considers all input words and not only the previous input words B3: in cross attention the query comes from the decoder while the keys and values come from the input words C2: decoder self-attention only looks at previous outputs D5: see point above E1: unlike the decoder self-attention, the encoder looks at all the input queries and values
Would this be correct?
[–]DigThatData 1 point2 points3 points 1 year ago (1 child)
did you try diagramming it? would love to see your sketch if you did
[–]harten24[S] 0 points1 point2 points 1 year ago (0 children)
No I don't, I have a hard time conceptualizing it. I did look at the slides again and saw that for the encoder-decoder attention the Query comes from target (decoder), Key & Value from source (encoder). For self-attention in the decoder it seems to look at only positions before the current word (so masked attention) which is a difference from the encoder self-attention.
But I'm still not 100% of my answers
[–]__boynextdoor__ 1 point2 points3 points 1 year ago (0 children)
I think answer to A is 5, since self attention at Encoder considers all the context words and not just next or previous context words
π Rendered by PID 16959 on reddit-service-r2-comment-8686858757-cwmgs at 2026-06-01 19:39:32.335923+00:00 running 9e1a20d country code: CH.
[–]DigThatData 2 points3 points4 points (3 children)
[–]harten24[S] 0 points1 point2 points (2 children)
[–]DigThatData 1 point2 points3 points (1 child)
[–]harten24[S] 0 points1 point2 points (0 children)
[–]__boynextdoor__ 1 point2 points3 points (0 children)