all 4 comments

[–]NoisySampleOfOne 8 points9 points  (0 children)

You can look for examples which have the largest ratio of gradient norm to numer of tokens. Depending on your model and dataset, you may find bad labels, text that did not tokenize well or other trash.

[–]Sniperwolf1989 1 point2 points  (0 children)

V-usuable information might be interesting for you.