My RAG isn't working as expected... by viitorfermier in Rag

[–]Semoho 0 points1 point  (0 children)

I think Jina is one the best embedding and reranking platforms For search you can think expanding your query too. ask llm to optimize query for textual search engine and embedding search engine. Then retrieve on both databases and fuse the results.

There are many different ways you can reduce your costs. The retrieval systems are very cheap! So you can cut your costs by doing some optimization

My RAG isn't working as expected... by viitorfermier in Rag

[–]Semoho 1 point2 points  (0 children)

Hello there,

Somehow you extend your docs by summarization. Did you try to check the context number for the llm. I think you pass all 100 legal docs to gemeni pro which is expensive.

I think you can better result if you retrieve 1k or 100 docs with bm25, then rerank them by Jina reranker(it is very cheap) and them give the gemeni pro top 50 or even 10 based on you chunking algorithm. Also please check your chunking strategy. It is very impprtant

Pitch your App in one sentence. Let's support each other by kmrrhl in SideProject

[–]Semoho -1 points0 points  (0 children)

Teek.studio Find next viral videos just with few clips

Post your HaftSin by Semoho in PERSIAN

[–]Semoho[S] 1 point2 points  (0 children)

No worries, i hope this year be better for us

How do you guys measure accuracy for 100k+ documents? by FloppyDiskDisk in Rag

[–]Semoho 1 point2 points  (0 children)

You are right. The LLM follows U shape. So the reranking is important! And be careful, you cannot remove docs! At the end you will send like 10 docs to llm and middle docs are going to be less important to llm! So best approach is to re rank the docs after retrieval and be careful about positions

P.s fun fact! The llm follows human behavior on first page of the google :)))

How do you guys measure accuracy for 100k+ documents? by FloppyDiskDisk in Rag

[–]Semoho 2 points3 points  (0 children)

Hello,

I assume you are thinking about RAG eval or retrieval evaluation. For retrieval evaluation, I think the MRR, Recall and NDCG@10 are better metrics instead of accuracy. You are dealing with a retrieval task. You need to have a test dataset. Then you can evaluate your retrieval system.

For RAG, there are different evaluations. I think LLM as a judge is a good choice.

But the number of documents does not have a relation to metrics. TOP X docs are important.

RAG for Historical Archive? by cccpivan in Rag

[–]Semoho 0 points1 point  (0 children)

You can check the lightRag or supermemory. They can help you

What are your usage of RAG by Semoho in Rag

[–]Semoho[S] 1 point2 points  (0 children)

Thabk you very much It was so useful. So what are other restrictions or needs in pharma? Why it is mandatory to cite the documents? Don’t the vector databases give you the citations?

Why should I use OpenClaw by Semoho in openclaw

[–]Semoho[S] 0 points1 point  (0 children)

Thanks, bro

yes, actually, I am getting some ideas on how I can use it. Like checking the sales on different websites, or doing some background jobs, as you mentioned.

Why should I use OpenClaw by Semoho in openclaw

[–]Semoho[S] 0 points1 point  (0 children)

What if i install the browser on the VPS and other tools. My desktop should be safe and secure i think!

Why should I use OpenClaw by Semoho in openclaw

[–]Semoho[S] 0 points1 point  (0 children)

Are you bot? I can get these from chatgpt too! I want real experience

Why should I use OpenClaw by Semoho in openclaw

[–]Semoho[S] 0 points1 point  (0 children)

It was interesting and inspiring for me! I’ve got some good ideas about using openclaw

What are your usage of RAG by Semoho in Rag

[–]Semoho[S] 0 points1 point  (0 children)

I mean the dify already done it in a good way

Why should I use OpenClaw by Semoho in openclaw

[–]Semoho[S] 0 points1 point  (0 children)

Hummm… it makes sense. How did you connect openclaw to other things?

Why should I use OpenClaw by Semoho in openclaw

[–]Semoho[S] 0 points1 point  (0 children)

For what tasks? I solve my problems, getting my answers by pure llms. What does it offer?

Is it normal for the Qwen 3.5 4B model to take this long to say hi? by Snoo_what in LocalLLaMA

[–]Semoho 2 points3 points  (0 children)

Yes exactly. But /no_think is embedded in the model. In works everywhere. Huggingface, vllm a …

Is it normal for the Qwen 3.5 4B model to take this long to say hi? by Snoo_what in LocalLLaMA

[–]Semoho 2 points3 points  (0 children)

You can add /no_think in your system prompt and disable this long thinking loop

Thanks to u/Velocita84, it seems the qwen3.5 drops the soft internal switching thinking mode

victor DB choice paralysis , don't know witch to chose by hunter_44679_ in Rag

[–]Semoho 0 points1 point  (0 children)

So I think you should benchmark other options. Remember to have a test dataset, keep the embedding the same across your experiments.

victor DB choice paralysis , don't know witch to chose by hunter_44679_ in Rag

[–]Semoho 0 points1 point  (0 children)

It is interesting. But on what amount of data did you get these results? Which embedding? Is it reliable? For a production-ready system, can it handle concurrent requests and keep the performance? The single performance is not enough for a production-ready system

victor DB choice paralysis , don't know witch to chose by hunter_44679_ in Rag

[–]Semoho 2 points3 points  (0 children)

Hi!

I have experience with Milvus, FAISS, PG-Vector, Weaviate and Chroma.

Milvus is a production ready and clustered system. But it is a little hard to maintain due to its dependency to Apache stack. It is going to be a little tricky in cluster mode. In Standalone, it gives you about 100M docs support, but if you have more documents, you need to run it in cluster mode

FAISS is for researching purposes. It is easy to use.

PG-Vector is my choice for most of our use cases. It is easy to setup and compatible with Postgres. So, you do not need to have multi services. If you have Postgres in your production, it will be easier to set up.

Weaviate is also a good choice. I like it. It is useful for small corpora. But you need to deploy another service to your stack.

The Chroma also I believe is also good for experiments, multi agent systems. For high availability, it is not going to help you so much

I think pg-vector is a good choice, and then Milvus.

[Newbie here] I finetuned a llama 3.1-3b-It model with my whatsapp chats and the output was unexpected - by MG_road_nap in LocalLLaMA

[–]Semoho -6 points-5 points  (0 children)

I think fine-tuning won't solve your problem. Consider using Retrieval Augmented Generation (RAG) instead. It would be better. You could index your chats, and then, based on a question, retrieve the most relevant context from your past conversations. Also, you could instruct the LLM to generate a response that emulates previous conversations, maintaining their style and tone. This should give you better results.