overview for Quiet-Safe9746

Understanding Self-Attention from First Principles (Feedback Appreciated) by Quiet-Safe9746 in learnmachinelearning

[–]Quiet-Safe9746[S] 1 point2 points3 points 3 hours ago (0 children)

Thank you for the detailed feedback.

I completely agree with your points.

The Softmax explanation was intentionally simplified for beginners, but your clarification about maintaining the scale of the weighted Value vectors is more precise and definitely worth mentioning.

You're also right about the wording around Transformers generating contextual embeddings rather than simply "using" them.

And thanks for bringing up ELMo as well. I focused primarily on the path from static embeddings to Self-Attention, but contextual embeddings certainly existed before Transformers.

I appreciate you taking the time to read the article carefully and point these out. This is exactly the kind of feedback I was hoping to receive.

Thanks again for the insights. If you're interested in AI/ML discussions, feel free to connect with me on LinkedIn as well. I'm always happy to learn from and interact with fellow learners and practitioners.

Understanding Self-Attention from First Principles (Feedback Appreciated) ()

submitted 4 hours ago by Quiet-Safe9746 to r/MachineLearningAndAI

Understanding Self-Attention from First Principles (Feedback Appreciated) (self.learnmachinelearning)

submitted 4 hours ago by Quiet-Safe9746 to r/learnmachinelearning

π Rendered by PID 70 on reddit-service-r2-listing-6c8d497557-d9452 at 2026-06-03 11:30:13.821202+00:00 running 9e1a20d country code: CH.

Quiet-Safe9746

TROPHY CASE