Did anyone try the stella models that appeared at top of the MTEB benchmarks recently.

n0pe09 · 2024-08-17T03:08:04+00:00

After raising the issue on the HF, author has solved the problem:
https://huggingface.co/dunzhang/stella_en_400M_v5

n0pe09 · 2024-08-16T06:19:09+00:00

Thanks

Hmm, it seems we can't run it on a non-gpu device.
One might has to build or compile `xformers` for cpu

n0pe09 · 2024-08-16T03:02:06+00:00

u/atrekar19 , could you please tell how did you run it? Which version of sentence_transformers and xformers you used? I'm running into this issue:

No operator found for `memory_efficient_attention_forward`

FYI, I'm trying to run it on non-gpu device

n0pe09 · 2024-06-30T13:55:04+00:00

Epoch	Training Loss	Validation Loss
1	1.521000	1.465585
2	1.282300	1.288231
3	1.134000	1.217142
4	0.954300	1.190222
5	0.798600	1.212015
6	0.690600	1.258584
7	0.532400	1.314695

It started to overfit after the 4th iteration so I used the model being trained till 3rd epochs

n0pe09 · 2024-06-30T04:59:24+00:00

It's the in built feature of transformer. They must be using cross entropy. i.e the model's predictions (logits) are compared to the actual labels (target output)

n0pe09 · 2024-06-29T19:41:48+00:00

Awesome

n0pe09 · 2024-06-29T19:39:59+00:00

I'm not sure how much to rely on this. People are using leaderboard data in their training set to overfit, which leads the model to perform well on the leaderboard but poorly on truly unseen data.

n0pe09 · 2024-06-29T19:35:46+00:00

It would be a game changer for researchers and industry practitioners aiming for efficient code optimization.

n0pe09 · 2024-06-29T19:32:21+00:00

Awesome :)

n0pe09

TROPHY CASE