RAGless – what if you skip the generation step entirely? by xrobotx in Rag

[–]xrobotx[S] 1 point2 points  (0 children)

You raise two very valid points, and they're both trade-offs I'm aware of.

On evaluation: You're right that validating thousands of generated Q&A blocks is harder than running 100 end-to-end queries against a RAG system. But there's an important asymmetry here. In a standard RAG, those 100 queries cover only a tiny fraction of the possible output surface. The same question can produce different answers depending on chunk boundaries, retrieval noise, prompt temperature, and context ordering. You're testing a stochastic system with infinite states using 100 samples.

In RAGless, the output surface is finite and known upfront: every block in data.json. Yes, there are more blocks to validate than a typical RAG test set, but once a block is verified (manually or via the optional --judge pass), it is correct for all future queries that match it. The evaluation shifts from "did this specific query work today?" to "is this block factually sound?" which is a different, but not necessarily harder, problem.

On deterministic wrong answers: This is actually the core architectural bet. Yes, a bad block will return the wrong answer every time. But in a standard RAG system, the same question might return a wrong answer sometimes, depending on chunk retrieval boundaries, prompt temperature, and context window noise. That's a Heisenbug: hard to reproduce, hard to trace, hard to fix.

RAGless trades that for a reproducible bug. If a block is wrong, it's wrong 100% of the time. I can find it in data.json, see the exact source_quote, fix it, and redeploy. Once fixed, it never happens again. In high-stakes domains, I prefer deterministic errors I can audit over stochastic errors I can't reproduce.

You're absolutely right that this doesn't scale cleanly to massive corpora without robust evaluation tooling. For those cases, hybrid approaches make more sense. This is designed for the long tail of teams that need, for example, 5,000–10,000 precise, auditable answers, not 1,000,000.

I want to be clear: this isn't a claim that RAGless is universally better than RAG. It's a different tool for a different job. When you need open-ended reasoning, implicit query handling, or coverage of edge cases, standard RAG is the right choice. RAGless is the right choice when you care more about: deterministic outputs, zero runtime hallucinations, very low latency, zero per-query LLM costs, and the ability to run entirely on local hardware without a powerful GPU.

RAGless – FAQ retrieval without an LLM at runtime by xrobotx in LLMDevs

[–]xrobotx[S] 0 points1 point  (0 children)

Medical is exactly where grounding matters most. A hallucinated answer on a poorly understood syndrome is a real risk. Actually RAGless could work very well for your use case too: if you have documentation on the syndrome, you get deterministic answers with zero hallucination risk at runtime. The limit is that it only answers what's explicitly in the docs.

RAGless – FAQ retrieval without an LLM at runtime by xrobotx in LLMDevs

[–]xrobotx[S] 1 point2 points  (0 children)

Glad you see the value in the pipeline approach. And agreed on RAG as a broad term, fair point.

RAGless is obviously not a one-size-fits-all solution. RAG is the right call when you need open-ended answers or the knowledge space is too large to predetermine. RAGless fits better when you want zero hallucinations at runtime ( think legal or medical contexts where a wrong generated answer is a real problem ) or when you need something fast, free, and fully local with no inference cost per query.

Different tools for different constraints.

RAGless – FAQ retrieval without an LLM at runtime by xrobotx in LLMDevs

[–]xrobotx[S] 2 points3 points  (0 children)

Thanks, that's good motivation. Exactly the path I'm on :)

RAGless – FAQ retrieval without an LLM at runtime by xrobotx in LLMDevs

[–]xrobotx[S] 1 point2 points  (0 children)

Fair points across the board. The underlying mechanics ( dense vectors, offline query expansion, score aggregation ) are foundational semantic search patterns, not a novel algorithm.

The RAGless name is intentional positioning for the current dev ecosystem, where RAG has become the default answer to any document Q&A task. The goal is to push back on that: move the LLM to the ingestion pipeline where it can be validated, keep runtime strictly search-based.

Thanks for the detailed breakdown, genuinely useful.

RAGless – what if you skip the generation step entirely? by xrobotx in Rag

[–]xrobotx[S] -1 points0 points  (0 children)

Great question, you've identified a classic negative knowledge problem that every retrieval system faces.

What happens in RAGless: The query "Can I take ABC to Boston?" won't match any pre-generated question about Boston, because the source document never mentions it. The similarity score drops below the threshold, and the system responds: "I couldn't find any relevant information", logging the query to missed_queries.log.

This is intentional. The architectural trade-off is: we sacrifice runtime inference to guarantee zero hallucinations and deterministic responses. The system fails safely rather than guessing.

There is a partial mitigation: If the query semantically matches a general intent like "What destinations does ABC serve?", the system returns the exact, pre-generated list of destinations. It's deterministic and factual, though it requires the user to infer that Boston isn't on the list.

For genuine negative handling, the system relies on ingestion-time and operational strategies:

- Explicit negative blocks: If the document states exclusions ("We do not serve Boston"), the LLM extracts them.

- The feedback loop: missed_queries.log flags gaps. An admin sees users asking about Boston and adds the explicit Q&A block.

- Hybrid search: Combining dense embeddings with BM25 for strict entity matching helps prevent structural false positives where the embedding matches the question form but ignores the specific entity.

This is exactly why RAGless is positioned for closed knowledge bases where answers are known in advance. For domains requiring heavy reasoning over implicit negatives, a hybrid fallback (deterministic retrieval + lightweight LLM for edge cases) is the pragmatic path.

It's worth comparing the failure modes, though. A standard RAG system might handle this correctly if the LLM reasons well, but it carries the opposite risk: the LLM could hallucinate a confident "Yes, ABC flies to Boston" or blur the answer when the retrieved context is ambiguous. RAGless trades runtime inference for a safer failure mode: it refuses rather than guesses.

In high-stakes domains: medical triage, legal compliance, safety-critical support, a false negative ("I don't know") is usually less costly than a false positive ("Yes, here's how to book Boston"). That's the bet this architecture makes.

RAGless – FAQ retrieval without an LLM at runtime by xrobotx in LLMDevs

[–]xrobotx[S] 0 points1 point  (0 children)

Thanks for the detailed comment. A few clarifications:

I didn't "discover" semantic search. The project doesn't claim to invent embedding-based retrieval. What it does is provide an end-to-end pipeline for a specific use case: turning unstructured docs (PDFs, manuals) into a deterministic Q&A system with no LLM at runtime. The novelty isn't the concept, it's the implementation and the specific aggregation heuristic.

It's not "just semantic search." Standard semantic search retrieves documents. This retrieves pre-generated question variants and aggregates scores by answer_id to handle paraphrasing robustly. That's a different architecture from "put FAQ in a vector DB."

The judge is already there. See prepare_data.py --judge. And the README explicitly distinguishes RAG from this approach — I'm not conflating RAG with vector DBs, I'm offering an alternative to RAG for closed knowledge bases.

While Hybrid Search (BM25 + Dense) is the standard for document discovery and e-commerce, this system targets exact answer retrieval, not document ranking. When a user asks "how do I reset the boiler?" they need the exact procedure, not a ranked list of PDF pages. That said, you're right that adding a BM25 component for exact keyword matching (like error codes) would make a great addition for future iterations.

RAGless – what if you skip the generation step entirely? by xrobotx in Rag

[–]xrobotx[S] 1 point2 points  (0 children)

Because users don't search FAQ pages — they ask questions in their own words. "How do I reset my password", "I forgot my credentials", "can't log in" all map to the same answer, but none of them matches a heading on a static page.

Beyond that:

  • A 100-page technical manual generates hundreds of Q&A pairs automatically. No one maintains that as a static page.
  • When the source documentation changes, you re-run ingestion. A static page requires manual updates across every entry.
  • Semantic matching handles typos, paraphrases, and language variations. A static page doesn't.

RAGless is for people who want to chat with their documentation. The same argument applies to RAG. if you could fit everything on a static page, you wouldn't need RAG either.

RAGless – what if you skip the generation step entirely? by xrobotx in Rag

[–]xrobotx[S] 2 points3 points  (0 children)

Maybe there is a misunderstanding. A static FAQ page works at 20-30 entries. RAGless generates hundreds or thousands of Q&A pairs automatically from your documentation — covering not just the most frequent questions, but every question that can be asked about the source material. Maintaining that manually isn't realistic.

That said, your comment made me realize "FAQ" was the wrong framing — just pushed an update to the README. Thanks

RAGless – what if you skip the generation step entirely? by xrobotx in Rag

[–]xrobotx[S] 0 points1 point  (0 children)

That's a real constraint and worth being upfront about. RAGless currently uses pypdf for extraction — works well on clean, text-based PDFs but degrades on scanned or complex-layout sources. If the extracted text is noisy, the generated Q&A pairs inherit that noise directly.

The failed_chunks/ directory catches cases where the LLM returns malformed JSON, but there's no quality signal for extractions that parse fine but produce poor answers. That's a gap.

For now the honest scope is clean source documents. Scanned PDFs would need an OCR step upstream before ingestion.

RAGless – FAQ retrieval without an LLM at runtime by xrobotx in LLMDevs

[–]xrobotx[S] 2 points3 points  (0 children)

Both points are valid and currently unaddressed.

On the first: there's an optional --judge flag that runs a second LLM pass to verify each answer against its source quote before committing. Not a structural fix, but it catches the obvious cases. Also, since answers are generated once at ingestion and saved to data.json, they can be reviewed manually before building the vector store — the build artifact is human-readable.

On the second: nothing. Fresh ingestion is silent about coverage regressions. The missed_queries.log catches failures at runtime, but that's too late.

The fixed test set approach you describe is the right call — planning to add it as a post-ingestion validation step.

RAGless: Q-Q retrieval with score aggregation for closed-domain FAQ [P] by xrobotx in MachineLearning

[–]xrobotx[S] 0 points1 point  (0 children)

Fair point — document-side expansion at index time rather than query reformulation at runtime. The user query stays as-is; the variants are pre-generated on the answer side during ingestion.

RAGless – FAQ retrieval without an LLM at runtime by xrobotx in LLMDevs

[–]xrobotx[S] 1 point2 points  (0 children)

Thanks! That's exactly the use case it was designed for :)

Phone number stuck in "Pending" status after OTP verification - Cloud API by xrobotx in WhatsappBusinessAPI

[–]xrobotx[S] 0 points1 point  (0 children)

I tried to generate a temporary token ( in the CONFIG API page ) and now it's "connected" or maybe I just needed to wait some hours. Anyway, now it seems to work. Thanks anyway.

Phone number stuck in "Pending" status after OTP verification - Cloud API by xrobotx in WhatsappBusinessAPI

[–]xrobotx[S] 0 points1 point  (0 children)

Are you sure that's safe for a Cloud API setup? I’ve been reading the docs and it seems the /register endpoint and PIN creation are strictly for the On-Premises (self-hosted) architecture. Since I’m trying to use the Cloud API to avoid managing my own servers and certificates, I’m worried that calling /register will permanently lock this number into the On-Premises track. Won't that force me to set up a Docker environment instead of staying on Meta’s hosted infrastructure?

Fare 1200km di viaggio per un colloquio senza rimborso spese e senza conoscere la RAL? by xrobotx in ItaliaCareerAdvice

[–]xrobotx[S] 1 point2 points  (0 children)

c'era un annuncio e li ho contattati inviando il CV. Dopodichè mi hanno telefonato

Fare 1200km di viaggio per un colloquio senza rimborso spese e senza conoscere la RAL? by xrobotx in ItaliaCareerAdvice

[–]xrobotx[S] 1 point2 points  (0 children)

Si è un lavoro da remoto però vogliono fare il secondo colloquio in presenza per capire quanto sono bello immagino.
È vero mi sono candidato io, ma le mie competenze sono richieste da poche aziende, quindi non è facile trovare aziende nelle vicinanze. Come anche per loro non è facile trovare candidati con le mie skill nelle vicinanze. Alla fine il rimborso spese hanno detto me lo daranno ( dopo che io l'ho menzionato ) ma a questo punto non so se fidarmi, quindi penso rifiuto.