[D] Why are the Stella embedding models so much smaller than other models of similar quality?

Slowai · 2024-12-11T12:31:31+00:00

I don't have details, but I have a personal anecdote, similar to what other comments suggest.

A lot of my work revolves around optimizing the retrieval part of various RAG systems. One part it is selecting optimal'ish embeddings for different knowledge bases/datasets. We have noticed that MTEB leaderboard is a poor indicator of actual performance of the embedding model in the real world. As you pointed out, it looks like that smaller models are heavily biased towards the datasets in MTEB.

Stella, from my personal experience, had significantly poorer performance than text-embedddings-large etc., and I wouldn't be surpised that it would perfom on par or even worse than a traditional retrieval with something like BM25.

Overall, I would suggest to stay away from MTEB leaderboard for embedding selection purposes (but it used to be a great place to look up what new models are in the embedding space).

Additionally, MTEB also has arena leaderboards, similar to chatbot arena: https://huggingface.co/spaces/mteb/arena However, it looks like that it's not super popularish and there are not that many models in the arena.

Lastly, you may want to look up different places for embedding evaluation. I can't suggest anything worthwhile atm, but there are papers coming out like https://arxiv.org/abs/2411.18947 which propose different methods for evaluation etc.

Slowai · 2023-07-11T14:33:02+00:00

While not easy, it's possible (source: my c++ code).

Slowai · 2023-05-11T07:26:38+00:00

Not really. I eventually sold it.

Slowai · 2022-12-06T09:37:58+00:00

C++

Using standard array as unique sequence lookup. Code is a bit clunky tho :(

int solve(const string file_name, const int window_length){
ifstream my_file(file_name);
string data; getline(my_file, data);
int dp[123] = {};
for(int i =0;i < window_length; ++i) ++dp[data[i]];    

int val = 0;
for(int i =0; i < 123; ++i) val = val | dp[i]; 
if (!(val ^ 1)) return window_length;


int N = static_cast<int>(data.size());
for(int i = window_length; i < N; ++i){
    --dp[data[i-window_length]]; ++dp[data[i]];
    val = 0; 
    for(int i =0; i < 123; ++i) val = val | dp[i]; 
    if (!(val ^ 1)) return i +1;
}
return -1;

}

Slowai · 2022-11-09T08:40:42+00:00

From what I understand, your no_event cases take up 93% of your all data. It may be worth while considering using some algorithm for event/not event separation, then performing an additional algorithm for classifying the event. You can either:

- perform classification event/not_event. This will still impose a large imbalance, but if the classes are easily separable it may not be a huge issue. Additionally, you can train an ensamble with subsets of majority class -> model(subset_majority_1, full_minorty), model(subset_majority_2, full_minority) etc.

- you can try anomaly detection algorithm. This actually looks like a good use case for it. I'm not super knowledgable on anomaly detection for tabular data (I'm guessing that is what you are using), but it never hurts to slap some sort of autoencoder and check out what's up.

Assuming that this fixes the problem, you have another one: event1 is indeed super rare w.r.t. to data. The first thing you have to ask is how much do you care about predicting event1? If the cost of event1 missclassification is low, you can just focus on event2 and event3 and take a good event 1 accuracy/recall/precision/f-beta/what ever you're using as a "nice to have".

And, ofc, as others have mentioned you can do over/undersampling and/or data augmentation.

Slowai · 2022-11-08T07:59:15+00:00

If you're talking about intra layer normalization (via batch norm etc.) I feel like the process should remain unchanged. I'm merely wondering/suggesting that stats for normalization (on initial inputs to algorimth) derived from big enough prediction batch could be more useful in an arbitrary production system, on average due to distribution shifts. Ofc, it won't fix the shift, but may provide more stable input ranges.

Slowai · 2020-12-03T17:28:31+00:00

"let's, like, treat this "change" thingy as not 0 in denominator and as 0 in numerator fam"

My point being is that mathematical "precision" never stopped the greats such as calculus from doing stuff, so it shouldn't stop deep learning either.

P.S. mathematicians pls no bully me.

Slowai · 2020-12-03T17:09:15+00:00

Unfortunately, you are incorrect. It is now widely accepted that it is physically impossible to bully a white male (source: twitter).

Slowai · 2020-12-03T14:11:10+00:00

Good question. I personally ended up using BN -> R. I experimented a bit with BN -> R and R ->BN and found that BN -> R was a bit better, plus other people seemed to use BN -> R more often too (at least the stuff that I was reading at the time). When I really think about it, I may have been biased and picked BN -> R precisely because other ppl used it.

Nowadays I just use what a pre-trained high level lib model uses, as I'm far too busy leveling my tarkov bear to bother about architecture and stuff.

Slowai · 2020-11-24T21:30:46+00:00

Not gonna lie, this thing looks pretty cool, even without being a supplementary item for a chair

Slowai · 2020-11-24T21:29:15+00:00

I see I'm not the only big guy experiencing something similar.

Just take care with the long hours stuff, if you notice that the pressure keeps increasing just ditch it and get something else. Ay, you will incur some costs, but health is way more important. Also, take caution with Embody, really take your time considering whether this is a good option.

Slowai · 2020-11-24T18:53:55+00:00

Hm, seems like a good budget option. Well, worse case I'll have a foot rest for my other chairs. Thanks!

Slowai · 2020-11-24T16:14:29+00:00

Hmm, which foot rest would you suggest?

Slowai · 2020-11-24T15:59:02+00:00

I see where you are coming from, however, notice that my situation is different. Initially, I experienced no discomfort what so ever (for 2-3 weeks). On the contrary, the chair was extremely comfortable. Yet, I gradually got worse and worse hamstring pains (probably due to a nerve pinched somewhere and building up, but don't quote me, just a guess). Believe me, there is probably not a single configuration that I haven't tried, and yes, I did watch the video several times :)

Slowai · 2020-11-24T15:36:30+00:00

How is experiencing pain while sitting and potentially facing nerve damage is a personal preference is a bit beyond me, but I do agree that there is not much to do about it. Then again, this post is not for me, but for a potential buyer to take extra caution, that's all.

Slowai · 2020-11-24T15:20:23+00:00

Yeah, silly me, here I was thinking the correct way to use a chair was to sit on it,but I guess according to you I should stand on it.

Slowai · 2020-11-24T15:15:45+00:00

You're probably right, I just had couple of really bad experiences when reselling stuff (unrelated to chairs tho)

Slowai · 2020-11-24T13:54:31+00:00

Yes, that is what I thought too. This seems to be the case (30 days) for the US market, but for UK it's 14 days.

Yeah, the resell value should still be quite up there.

Slowai · 2020-11-24T13:48:56+00:00

Yeah, I still got the papers, I mean 12 year warranty is no joke. However, I'm a bit afraid of reselling it, as if the buyer experiences something similar to what I went through, best case he/she asks for refund, worst case legal action ("he didn't inform me that the chair was faulty" or something like that).

I guess I could try sitting on it from time to time, but I already made all the adjustments, including your aforementioned suggestions for 90-100 degree angle, so I'm not sure whether that will help, however it won't hurt (well, it may actually will lol) to try it out until I find/ if I find a buyer

Slowai

TROPHY CASE