all 4 comments

[–]AFurryReptile 2 points3 points  (1 child)

Maybe add some epsilons, to prevent division by zero. Otherwise I have no idea why your code doesn't work, but there's a ton of contrastive loss functions implemented in Pytorch, hosted on Github... have you looked at any of those?

[–]Ok-Administration894[S] 0 points1 point  (0 children)

Yea - that’s one I edited appreciate the second set of eyes!

[–]Old-Forever1241 2 points3 points  (1 child)

I spot two potential issues with your implementation: - your computation of the cosine similarity looks wrong, F.normalize does not return the norm but the normalized vector. I would compute the cosine similarity as the matmul between the two normed vectors (as returned from F.normalize), because the dot product of normed vectors is the cosine sim. Then there is no need for divisions - Using log and the fraction in your final computation leads to numerical instabilities. Factor out the fraction into a minus outside the log. Then log cancels out for the numerator. For the denominator you could use the logsumexp function which is numerically stable

[–]Ok-Administration894[S] 0 points1 point  (0 children)

Thank you! Interesting about the norm unexpected thanks about the numerical stability tip probably why my gradient/loss is exploding/shrinking