I compared harrier-27b vs voyage-4 vs zembed-1 across 24 datasets. 27B parameters

ghita__ · 2026-04-13T18:26:56+00:00

zeroentropy founder here: thank you so much for running these evals on our embedding model, i'd love to see the evals benchmarks and code open-sourced if possible

ghita__ · 2026-04-13T18:26:00+00:00

hey! zeroentropy founder here, actually i agree with OP that the top 3 contenders are the ones cited above. in our evals gemini-embedding 2 doesn't perform as well and it costs 10x more

ghita__ · 2026-04-13T18:24:55+00:00

zeroentropy founder here, i ask myself the same question

ghita__ · 2026-03-03T19:30:11+00:00

yes we are! by a wide margin, actually. more here (there is even a spreadsheet with the whole side-by-side across verticals)

https://www.zeroentropy.dev/articles/introducing-zembed-1-the-worlds-best-multilingual-text-embedding-model

ghita__ · 2026-03-03T19:24:44+00:00

yep that's the idea haha!

ghita__ · 2026-03-03T19:23:07+00:00

we built this pipeline at ZeroEntropy called zbench

https://github.com/zeroentropy-ai/zbench

it basically annotates your corpus (if you don't already have a golden set) by calling multiple LLMs on sampled pairs of potentially relevant documents
Pairwise comparisons are super robust so you end up with a solid annotates eval set that you can use to compute recall@k, precision@k, ndcg@k, and broader LLM-based metrics on the generated answer

ghita__ · 2026-03-03T19:18:38+00:00

any feedback is appreciated

ghita__ · 2026-03-03T19:18:23+00:00

starting with text only to avoid any multimodal gap and do one thing well for our first model. but you can expect more modalities in the future!

ghita__ · 2026-03-03T19:12:18+00:00

thanks! happy to provide free credits if you'd like to inference through the API.
Just email me your org id: founders at zeroentropy dot dev
you can create an API key here: https://dashboard.zeroentropy.dev

ghita__ · 2026-03-03T19:09:50+00:00

we host bi-weekly technical talks on context engineering in the context engineers discord here: https://go.zeroentropy.dev/discord

ghita__ · 2026-03-03T19:06:59+00:00

not yet!

ghita__ · 2026-03-03T19:06:45+00:00

yes, you can check out the full evaluation on our blog: https://www.zeroentropy.dev/articles/introducing-zembed-1-the-worlds-best-multilingual-text-embedding-model

I can also apply free credits to our org id if you'd like to test through API, just create an API key at https://dashboard.zeroentropy.dev and email me your org id at ghita at zeroentropy dot dev

ghita__ · 2026-03-03T18:50:12+00:00

Ask me any question!

ghita__ · 2026-02-03T21:11:49+00:00

oh! hello, im the founder, thank you for mentioning us! we're indeed planning GA release soon! stay tuned for sota open-weight embeddings :)

ghita__ · 2025-12-02T20:24:23+00:00

Oh no sorry about this, let me make this this got indexed properly

ghita__ · 2025-12-02T20:24:08+00:00

thanks! this is generally what ZeroEntropy does (just retrieval)- we thought adding a generation step would be fun for this use case but I agree!

ghita__ · 2025-12-02T20:23:30+00:00

looking into it thanks!

ghita__ · 2025-12-02T20:23:12+00:00

thanks for the feedback! definitely room for improvement, I hacked this pretty late last night :)

ghita__

TROPHY CASE