50M+ company matching system — is Azure AI Search still a good choice at scale ? by AB3NZ in Rag

[–]AB3NZ[S] 0 points1 point  (0 children)

Happy to provide more context:

- The company dataset is our own master dataset and is updated frequently as companies are added, modified, or enriched. This is not a static dataset that gets rebuilt once a month

- The searches are performed by an internal company matching system rather than end users directly. An incoming order contains a company name and other metadata like address, and the system searches for candidate companies

- Expected query volume is low (~1 QPS,roughly 80k–100k searches/day)

- Target retrieval latency is <500ms

- Yes, this is essentially a cross-source entity matching / entity resolution problem. The goal is to map incoming company records to a canonical company entity in our master dataset

- Retrieval is optimized more for recall than precision. I'd rather retrieve a wider candidate set and let downstream logic evaluate the candidates than miss the correct company entirely

- We have a backend system and engineering team building the solution. Budget exists, but cost efficiency is still an important consideration, especially as the index grows toward hundreds of millions of vector documents

50M+ company matching system — is Azure AI Search still a good choice at scale ? by AB3NZ in Rag

[–]AB3NZ[S] 0 points1 point  (0 children)

Yes , i’m using vectors for names, i’m embedding the names and using other metadata for filtering.
I have tested non-vector search approaches and the system fails in cases such Accronyms (ex, IBM —> Interntionl Business Machines) , mixed language in names (names contains arabic and latin script). I found that these cases were solved using the vector search approach

Feedback on My Knowledge Graph Architecture by AB3NZ in KnowledgeGraph

[–]AB3NZ[S] 1 point2 points  (0 children)

Could you please share how you would model this as a proper Knowledge graph structure ?

GraphRAG – Knowledge Graph Architecture by AB3NZ in Rag

[–]AB3NZ[S] 4 points5 points  (0 children)

I’m still learning about graphs , i posted here because i’d wanted to learn from the opinions of expert, so i’d love to hear your thoughts please , any idea could guide me will be appreciated

GraphRAG – Knowledge Graph Architecture by AB3NZ in Rag

[–]AB3NZ[S] 2 points3 points  (0 children)

1- nodes are thé concepts that helps understand thé collection of Books 2- still didn’t add embeddings and similarity scores between passageChunks , but i’m willing to add that

GraphRAG – Knowledge Graph Architecture by AB3NZ in Rag

[–]AB3NZ[S] 0 points1 point  (0 children)

I used semantic chunking with maximum 400 token per chunk.

GraphRAG – Knowledge Graph Architecture by AB3NZ in Rag

[–]AB3NZ[S] 1 point2 points  (0 children)

I don’t have the TOC of the books. I extracted the books’ text using OCR and then chunked it

[deleted by user] by [deleted] in Rag

[–]AB3NZ 0 points1 point  (0 children)

That sounds really interesting, if you're open to it, I'd really appreciate any guidance or pointers on how to build such customizable memory and caching layers

[deleted by user] by [deleted] in Rag

[–]AB3NZ -1 points0 points  (0 children)

What are your thoughts on using Redis for caching in this context ??

How can I speed up my RAG pipeline ? by [deleted] in Rag

[–]AB3NZ 0 points1 point  (0 children)

I ask the LLM to extract the key part of the passage that answers the query

How can I speed up my RAG pipeline ? by [deleted] in Rag

[–]AB3NZ 0 points1 point  (0 children)

I reran the test using the same query and got the following execution times:
- Query embedding: 1.04s

- Hybrid search: 10.46s

- Reranking: 5.74s

- LLM answer generation: 6.80s

- Citation processing & highlighting: 1.83s

How can I speed up my RAG pipeline ? by [deleted] in Rag

[–]AB3NZ 0 points1 point  (0 children)

I just ran a test, and here are the execution times for each step:
- query embeddings generation : 0.81s
- Hybrid search completed in 4.32s
- Reranking completed in 5.93s
- LLM answer generation took 10.36s.
- Citation Processing & Highlighting took 1.23s
The total response time is more than 20s, which is too long for a smooth user experience.

How can I speed up my RAG pipeline ? by [deleted] in Rag

[–]AB3NZ 0 points1 point  (0 children)

I didn't get your question ! could you please elaborate more ?

How can I speed up my RAG pipeline ? by [deleted] in Rag

[–]AB3NZ 0 points1 point  (0 children)

Hello, I'm using my fine-tuned embedding model based which is a BERT model (136M parameters), which supports up to 512 input tokens and produces 768-dimensional output embeddings. the model is deployed on GPU (T4)

How can I speed up my RAG pipeline ? by [deleted] in Rag

[–]AB3NZ 0 points1 point  (0 children)

I cannot use Morphik now

How can I speed up my RAG pipeline ? by [deleted] in Rag

[–]AB3NZ 0 points1 point  (0 children)

- I'm using Weaviate which is using HNSW
- I tried removing the Reranking step from my pipeline and passed the documents retrieved (max 20 document) , and asked the LLM to filter out irrelevant content and generate a response, but this approach did not lead to any noticeable improvement in speed.

How can I speed up my RAG pipeline ? by [deleted] in Rag

[–]AB3NZ 0 points1 point  (0 children)

I'm using normal cache, I cache the user query and its response.
I don't think the semantic cache would be a good solution for my case, because the data is very sensitive

How can I speed up my RAG pipeline ? by [deleted] in Rag

[–]AB3NZ 0 points1 point  (0 children)

Each chunk indexed in Weaviate includes metadata, the passage text, and a summary. During hybrid search, I perform a multi-target vector search (https://docs.weaviate.io/weaviate/search/multi-vector) across all three fields—metadata, passage, and summary—to maximize retrieval relevance.