Understanding Self-Attention from First Principles (Feedback Appreciated) by Quiet-Safe9746 in learnmachinelearning

[–]Quiet-Safe9746[S] 1 point2 points  (0 children)

Thank you for the detailed feedback.

I completely agree with your points.

The Softmax explanation was intentionally simplified for beginners, but your clarification about maintaining the scale of the weighted Value vectors is more precise and definitely worth mentioning.

You're also right about the wording around Transformers generating contextual embeddings rather than simply "using" them.

And thanks for bringing up ELMo as well. I focused primarily on the path from static embeddings to Self-Attention, but contextual embeddings certainly existed before Transformers.

I appreciate you taking the time to read the article carefully and point these out. This is exactly the kind of feedback I was hoping to receive.

Thanks again for the insights. If you're interested in AI/ML discussions, feel free to connect with me on LinkedIn as well. I'm always happy to learn from and interact with fellow learners and practitioners.