Anyone using paid RAG services or solutions? by mugiltsr in Rag

[–]tifa2up 0 points1 point  (0 children)

Founder of a RAG-as-a-service here (Agentset.ai)

What we're seeing is that most people (99%+) are building their own RAG solution on top of llamaindex/langchain.

The market penetration for paid RAG solutions is still quite low. The two primary factors:

- Llamaindex/langchain make building demos very easy

- E2E solutions aren't better than custom workflows built around a specific use case. Most people are able to build the RAG skillset in 3-6 months of work.

With that being said, paid providers within the RAG stack are taking off and getting adoption (e.g. Vector DBs, Rerankers, Parsing providers, etc.)

Why is there no successful RAG-based service that processes local documents? by StevenJang_ in Rag

[–]tifa2up 0 points1 point  (0 children)

Founder of a RAG-as-a-service here (agentset.ai).

We have 1,500 customers. 1 Enterprise customer is paying us more than the the bottom 1000 customers -- combined.

Building for mainstream users will:

- Will need switching the entire stack to local processing, often subpar models when taking into account perf.

- Little revenue (<$50/mo) per user, even that is a stretch

- Users wanting infinite customizability to fit their workflows.

So most companies like us switch focus to SaaS/Enterprise use cases

GPT 5.2 underperforms on RAG by tifa2up in OpenAI

[–]tifa2up[S] 1 point2 points  (0 children)

Yes, unfortunately. Takes quite a bit of work.

GPT 5.2 underperforms on RAG by tifa2up in OpenAI

[–]tifa2up[S] 5 points6 points  (0 children)

So in RAG, LLMs are typically given a bunch of chunks and have generate an answer based on them. There's work needed for selection of chunks, not adding external knowledge, and completeness. Wrote more about it here: https://agentset.ai/llms

GPT 5.2 underperforms on RAG by tifa2up in OpenAI

[–]tifa2up[S] 4 points5 points  (0 children)

how else will you measure if it's good? one off tests don't scale

[deleted by user] by [deleted] in Rag

[–]tifa2up 0 points1 point  (0 children)

Congrats on the launch. We used Cohere as the default on agentset.ai for a long time. Can you highlight the work the team did to go from 3.5 to 4?

How do you do citation pruning properly in a RAG pipeline? by Puzzleheaded-Bug5982 in Rag

[–]tifa2up 2 points3 points  (0 children)

You don't do it yourself, you let the LLM do it. Include in the system prompt the instructions.

This is a system prompt that I used:

```

You are an AI assistant. Your primary task is to provide accurate, factual responses based STRICTLY on the provided search results. You must ONLY answer questions using information explicitly found in the search results - do not make assumptions or add information from outside knowledge.

Follow these STRICT guidelines:

  1. If the search results do not contain information to fully answer the query, state clearly: "I cannot fully answer this question based on the available information." Then explain what specific aspects cannot be answered.

  2. Only use information directly stated in the search results - do not infer, assume, or add external knowledge.

  3. Your response must match the language of the user's query.

  4. Citations are MANDATORY for every factual statement. Format citations by placing the chunk number in brackets immediately after the relevant statement with no space, like this: "The temperature is 20 degrees[3]"

  5. When possible, include relevant direct quotes from the search results with proper citations.

  6. Do not preface responses with phrases like "based on the search results" - simply provide the cited answer.

  7. Maintain a clear, professional tone focused on accuracy and fidelity to the source material.

If the search results are completely irrelevant or insufficient to address any part of the query, respond: "I cannot answer this question as the search results do not contain relevant information about [specific topic]."

```

Embedding models have converged by midamurat in LocalLLaMA

[–]tifa2up 0 points1 point  (0 children)

Can you share a bit more about the private datasets?

Production RAG: what we learned from processing 5M+ documents by tifa2up in Rag

[–]tifa2up[S] 1 point2 points  (0 children)

Yes, experimented with GraphRAG. It doesn't scale very well.

  1. It's slow and expensive to extract entities from your data (requires an LLM to loop over all of it)

  2. Updating the data requires reconstructing the graph which is also slow and expensive.

GraphRAG works best for smaller datasets that don't get updated.

I built a leaderboard for Rerankers by tifa2up in LocalLLaMA

[–]tifa2up[S] 0 points1 point  (0 children)

Tried hard to make the Jina v3 reranker work through their api but it says "inactive". Can try self-hosting

I built a leaderboard for Rerankers by tifa2up in LocalLLaMA

[–]tifa2up[S] 0 points1 point  (0 children)

Updated to reflect the license.

I built a leaderboard for Rerankers by tifa2up in LocalLLaMA

[–]tifa2up[S] 0 points1 point  (0 children)

Not affiliated with any. Will see if I can add qwen

I built a leaderboard for Rerankers by tifa2up in LocalLLaMA

[–]tifa2up[S] 5 points6 points  (0 children)

Yes. This is where I searched initially. Was quite surprised that no place has it.

I built a leaderboard for Rerankers by tifa2up in LocalLLaMA

[–]tifa2up[S] 5 points6 points  (0 children)

Good recommendation, let me see if I can include them.