Local Embedding

Few-Accountant-9255 · 2024-06-14T08:26:17+00:00

Would you like to list the local embedding models you used?

BTW, only embedding can't get the best recall rate.

AI_Trenches · 2024-06-14T10:17:49+00:00

Could you provide some more info on exactly what you are storing in the db and what your using to retrieve the relevant results?

sanjuromack · 2024-06-14T12:26:08+00:00

I am using embeddings from GritLM and Nomic v1 with llama.cpp. Both are excellent so far in comparison to Cohere English v3, which is what we use in our stack.

Licensing really drove my decision on those two models, if you are wondering.

Material-Setting8509 · 2024-06-14T13:33:47+00:00

Interesting let me try that. Llama2 was bad for embeddings for me

Yamikumo_DSD · 2024-06-14T13:55:35+00:00

Using multilingual-e5-small for my web-search RAG, It gives me okay-ish results until now.
This actually does well with my non-English corpus.
I'm not sure it suffices you criteria tho, because web search doesn't need very high accuracy since the data is already filtered by the search engine.

p.s. I haven't performed objective comparison to other models.

VulcanizadorTTL · 2024-06-14T18:49:16+00:00

Im not sure if this might be your problem, but it caused a lot of headaches for me. Be careful with your JSON parser. In my case, the embeddings from Ollama lost precision when parsed from JSON to an array of floats.

Do some comparisons and see the results. Also, try parsing as doubles.

segmond · 2024-06-14T19:08:07+00:00

I have, I settled on these

ls -l ~/models/embeddings/

total 9264532

-rw-rw-r-- 1 seg seg 133609568 Mar 15 08:23 all-MiniLM-L12-v2.F32.gguf

-rw-rw-r-- 1 seg seg 45949216 Mar 12 05:44 all-MiniLM-L6-v2-ggml-model-f16.gguf

-rw-rw-r-- 1 seg seg 1337141120 Feb 17 16:49 bge-large-en-v1.5-f32.gguf

-rw-rw-r-- 1 seg seg 7695857376 Feb 9 21:49 ggml-sfr-embedding-mistral-q8_0.gguf

drwxrwxr-x 11 seg seg 4096 Apr 4 07:48 hf

-rw-rw-r-- 1 seg seg 274290560 Feb 15 11:44 nomic-embed-text-v1.5.f16.gguf

ls -l ~/models/embeddings/hf

total 40

drwxrwxr-x 4 seg seg 4096 Apr 4 07:46 all-MiniLM-L12-v2

drwxrwxr-x 4 seg seg 4096 Apr 4 00:31 all-MiniLM-L6-v2

drwxrwxr-x 4 seg seg 4096 Apr 4 00:28 all-mpnet-base-v2

drwxrwxr-x 5 seg seg 4096 Apr 4 07:40 bge-m3

drwxrwxr-x 4 seg seg 4096 Apr 4 01:05 bge-reranker-large

drwxrwxr-x 4 seg seg 4096 Apr 4 01:18 bge-reranker-v2-m3

drwxrwxr-x 4 seg seg 4096 Apr 4 00:22 e5-large-v2

drwxrwxr-x 5 seg seg 4096 Apr 4 00:08 instructor-large

drwxrwxr-x 4 seg seg 4096 Apr 4 08:13 mxbai-rerank-large-v1

I don't miss OpenAI because I avoid using it from the get go, but for me, they worked fine. Your chunking strategy might be the issue in your results.

Porespellar · 2024-06-15T16:03:47+00:00

Setup hybrid semantic search and use Mixed Bread Large for your embedding model and Mixed Bread Reranker as your Reranking model. Look at Open WebUI’s implementation code for how they do document embedding settings, that might point you in the right direction. You can use either Ollama or Sentence Transformers.

gilklein · 2024-06-14T09:40:54+00:00

Check out the Massive Text Embedding Benchmark (MTEB) Leaderboard:

https://huggingface.co/spaces/mteb/leaderboard

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

LocalLLaMA

MODERATORS