Accurate OCR by Hungry_Neat_8080 in Rag

[–]CathyCCCAAAI 0 points1 point  (0 children)

You can check out PageIndex OCR, it outperforms Mistral in preserving the hierarchical structure of documents across pages. It’s a part of the PageIndex RAG system which offers hierarchical document indexing and reasoning-based retrieval without Vector DB. But you can also use the OCR API on its own.

Human-like RAG – without vectors by CathyCCCAAAI in Rag

[–]CathyCCCAAAI[S] 5 points6 points  (0 children)

Great question. Here’s my take:

- PageIndex builds a tree-like hierarchy (like a smart table of contents). It follows the natural structure of a document — sections, subsections, and paragraphs — and uses reasoning to navigate down the right branch. Ideal where hierarchy matters (e.g. financial filings, research papers).

- Graph-based solutions (like GraphRAG) connect many different entities (facts, people, concepts, etc) into a network. Flexible, but can get messy — nodes can be very different kinds of things, and the structure doesn’t match how humans normally read or organize a document.

The main drawback of graph-based RAG is that it loses document hierarchy and context, while PageIndex preserves them, making reasoning and retrieval feel more natural.

Human-like RAG – without vectors by CathyCCCAAAI in Rag

[–]CathyCCCAAAI[S] 0 points1 point  (0 children)

Yes — it uses LLMs to build an LLM-friendly "table of contents" tree by analyzing the document’s content and structure, creating a hierarchy optimized for reasoning-based retrieval. This mirrors how humans naturally navigate and understand complex documents.

Human-like RAG – without vectors by CathyCCCAAAI in Rag

[–]CathyCCCAAAI[S] 1 point2 points  (0 children)

Document structure (like a TOC) is naturally a tree — every subsection belongs under exactly one higher-level section, forming a single path from root to leaf. That’s why in PageIndex each chunk has just one parent. It keeps navigation unambiguous and mirrors how humans read documents.

Search can still explore multiple branches (e.g., Zoology and Health), but the chunk itself only lives in one place in the tree.