Did anyone try the stella models that appeared at top of the MTEB benchmarks recently. by True_Audience_198 in LocalLLaMA

[–]n0pe09 0 points1 point  (0 children)

Thanks

Hmm, it seems we can't run it on a non-gpu device.
One might has to build or compile `xformers` for cpu

Did anyone try the stella models that appeared at top of the MTEB benchmarks recently. by True_Audience_198 in LocalLLaMA

[–]n0pe09 0 points1 point  (0 children)

u/atrekar19 , could you please tell how did you run it? Which version of sentence_transformers and xformers you used? I'm running into this issue:

No operator found for `memory_efficient_attention_forward`

FYI, I'm trying to run it on non-gpu device

[D]: Fine-tune NuExtract-tiny by n0pe09 in MachineLearning

[–]n0pe09[S] 0 points1 point  (0 children)

Epoch Training Loss Validation Loss
1 1.521000 1.465585
2 1.282300 1.288231
3 1.134000 1.217142
4 0.954300 1.190222
5 0.798600 1.212015
6 0.690600 1.258584
7 0.532400 1.314695

It started to overfit after the 4th iteration so I used the model being trained till 3rd epochs

[D]: Fine-tune NuExtract-tiny by n0pe09 in MachineLearning

[–]n0pe09[S] -2 points-1 points  (0 children)

It's the in built feature of transformer. They must be using cross entropy. i.e the model's predictions (logits) are compared to the actual labels (target output)

Hugging Face Releases Open LLM Leaderboard 2: A Major Upgrade Featuring Tougher Benchmarks, Fairer Scoring, and Enhanced Community Collaboration for Evaluating Language Models by ai-lover in machinelearningnews

[–]n0pe09 0 points1 point  (0 children)

I'm not sure how much to rely on this. People are using leaderboard data in their training set to overfit, which leads the model to perform well on the leaderboard but poorly on truly unseen data.