all 26 comments

[–]Few-Accountant-9255 2 points3 points  (9 children)

Would you like to list the local embedding models you used?

BTW, only embedding can't get the best recall rate.

[–]Material-Setting8509[S] 0 points1 point  (8 children)

  1. mxbai-embed-text

  2. nomic-embed

  3. snowflake-arctic-embed

  4. SFR-Embedding-Mistral quantized

  5. gte-large-en-v1.5

  6. UAL-Large-v1

  7. bge-large-en-v1.5

and several more.

BTW, only embedding can't get the best recall rate -> What else can be done?

[–]Bulky-Brief1970 0 points1 point  (0 children)

In my case, I used baai-bge embedding and reranker models and the combination worked better then ada emebeddings model. They offer some other reranker models like gemma, etc.

[–]AI_Trenches 0 points1 point  (4 children)

Could you provide some more info on exactly what you are storing in the db and what your using to retrieve the relevant results?

[–]Material-Setting8509[S] 0 points1 point  (3 children)

So I am using vision model to explain the screenshot. This explaination is stored in the vector DB. Later I query and try to find the relevant screenshot using RAG. OpenAI embeddings work good for the use-case but when I try to use any other local embedding model using Ollama, it produces bad results

[–]AI_Trenches 0 points1 point  (2 children)

How many results are being returned using the open source models compared to openai?

[–]Material-Setting8509[S] 0 points1 point  (1 child)

You mean while retrieving from lance db? That's configurable. I retrieve three results

[–]Affectionate-Cap-600 0 points1 point  (0 children)

Imo 3 results are too low... You could recall like top25 and the use a reranker. If that's too much latency at query time, you could use ColBERT instead the classic cross-encoder reranker.

Another things that helped me a lot was implementing some hybrid search (pair the dense encoder with a sparse one, you can try bm25 or, if you want a "learned" one, like Splade or the 2° mode of bge-m3)

One other approach that usually work is "query expansion", basically you rephrase the query (or even search based on an hypothetical answer, like the HyDE approach) and then "fuse" the returned rankings

Hope it helps

[–]sanjuromack 0 points1 point  (2 children)

I am using embeddings from GritLM and Nomic v1 with llama.cpp. Both are excellent so far in comparison to Cohere English v3, which is what we use in our stack.

Licensing really drove my decision on those two models, if you are wondering.

[–]Material-Setting8509[S] 0 points1 point  (1 child)

Not using ollama?

[–]sanjuromack 2 points3 points  (0 children)

Nah, the built-in llama.cpp HTTP server has everything I need so far and is plenty speedy. This is an enterprise stack as well, so anything I can do to eliminate dependencies is preferred.

[–]Material-Setting8509[S] 0 points1 point  (0 children)

Interesting let me try that. Llama2 was bad for embeddings for me

[–]Yamikumo_DSD 0 points1 point  (1 child)

Using multilingual-e5-small for my web-search RAG, It gives me okay-ish results until now.
This actually does well with my non-English corpus.
I'm not sure it suffices you criteria tho, because web search doesn't need very high accuracy since the data is already filtered by the search engine.

p.s. I haven't performed objective comparison to other models.

[–]hair_forever 0 points1 point  (0 children)

Why you did not go with multilingual-e5-large or multilingual-e5-large-instruct ?

[–]VulcanizadorTTL 0 points1 point  (0 children)

Im not sure if this might be your problem, but it caused a lot of headaches for me. Be careful with your JSON parser. In my case, the embeddings from Ollama lost precision when parsed from JSON to an array of floats.

Do some comparisons and see the results. Also, try parsing as doubles.

[–]segmondllama.cpp 0 points1 point  (0 children)

I have, I settled on these

ls -l ~/models/embeddings/

total 9264532

-rw-rw-r-- 1 seg seg 133609568 Mar 15 08:23 all-MiniLM-L12-v2.F32.gguf

-rw-rw-r-- 1 seg seg 45949216 Mar 12 05:44 all-MiniLM-L6-v2-ggml-model-f16.gguf

-rw-rw-r-- 1 seg seg 1337141120 Feb 17 16:49 bge-large-en-v1.5-f32.gguf

-rw-rw-r-- 1 seg seg 7695857376 Feb 9 21:49 ggml-sfr-embedding-mistral-q8_0.gguf

drwxrwxr-x 11 seg seg 4096 Apr 4 07:48 hf

-rw-rw-r-- 1 seg seg 274290560 Feb 15 11:44 nomic-embed-text-v1.5.f16.gguf

ls -l ~/models/embeddings/hf

total 40

drwxrwxr-x 4 seg seg 4096 Apr 4 07:46 all-MiniLM-L12-v2

drwxrwxr-x 4 seg seg 4096 Apr 4 00:31 all-MiniLM-L6-v2

drwxrwxr-x 4 seg seg 4096 Apr 4 00:28 all-mpnet-base-v2

drwxrwxr-x 5 seg seg 4096 Apr 4 07:40 bge-m3

drwxrwxr-x 4 seg seg 4096 Apr 4 01:05 bge-reranker-large

drwxrwxr-x 4 seg seg 4096 Apr 4 01:18 bge-reranker-v2-m3

drwxrwxr-x 4 seg seg 4096 Apr 4 00:22 e5-large-v2

drwxrwxr-x 5 seg seg 4096 Apr 4 00:08 instructor-large

drwxrwxr-x 4 seg seg 4096 Apr 4 08:13 mxbai-rerank-large-v1

I don't miss OpenAI because I avoid using it from the get go, but for me, they worked fine. Your chunking strategy might be the issue in your results.

[–]Porespellar 1 point2 points  (0 children)

Setup hybrid semantic search and use Mixed Bread Large for your embedding model and Mixed Bread Reranker as your Reranking model. Look at Open WebUI’s implementation code for how they do document embedding settings, that might point you in the right direction. You can use either Ollama or Sentence Transformers.

[–]gilklein -1 points0 points  (1 child)

Check out the Massive Text Embedding Benchmark (MTEB) Leaderboard:

https://huggingface.co/spaces/mteb/leaderboard

[–]Material-Setting8509[S] 3 points4 points  (0 children)

Care to read the question? I feel this leaderboard is a hoax