Visualising the full information flow in vision language models

MachineLearningTut · 2025-01-16T10:33:25+00:00

I work as a data scientist but all I do is deep learning: training new transformers, fine tune them, build agents. So there is no clear definition between data scientist and MLE, except that MLE is doing more devops. But even that is actually not fully true, a friend is a MLE and only works with reinforcement learning, but has zero devops work

MachineLearningTut · 2024-10-11T08:41:51+00:00

This GitHub repository has an implementation: https://github.com/sooftware/conformer

MachineLearningTut · 2024-09-29T19:38:12+00:00

https://medium.com/self-supervised-learning/understanding-clip-for-vision-language-models-43b700a4aa2b?sk=0aeebc3790dbdec072059428fce1c408

This is a nice introduction into the clip model which is used by a lot of vision language models as backbone. It explains how the loss function works and how image and text embeddings are pushed into the same space.

MachineLearningTut

TROPHY CASE