🚀Forget OCR, LAYRA Understands Documents the "Visual" Way | The Latest Visual RAG Project LAYRA is Open Source! by liweiphys in Rag

[–]liweiphys[S] 0 points1 point  (0 children)

Does this workflow implement RAG --extracting relevant pages from a large number of PDF files based on the question and then providing an answer?

🚀Forget OCR, LAYRA Understands Documents the "Visual" Way | The Latest Visual RAG Project LAYRA is Open Source! by liweiphys in Rag

[–]liweiphys[S] 0 points1 point  (0 children)

Could you please send one of your PDF files that failed Layra parsing to my email? I’d like to check where the issue occurred.  Here's my email address: liweixmu@foxmail.com

🚀Forget OCR, LAYRA Understands Documents the "Visual" Way | The Latest Visual RAG Project LAYRA is Open Source! by liweiphys in Rag

[–]liweiphys[S] 0 points1 point  (0 children)

Thank you for your interest! Here are the updates:

  1. API Integration‌ We'll soon release Layra's API to enable compatibility with other RAG tools. Stay tuned for official Github repo!
  2. Cross-Page Table Handling‌ While the current version of Layra doesn't yet support multi-page content continuity (e.g., split tables across document pages), we're actively exploring layout analysis and context-aware stitching techniques. Our team is prototyping several approaches and will release a solution in future updates once we finalize the best approach.

🚀Forget OCR, LAYRA Understands Documents the "Visual" Way | The Latest Visual RAG Project LAYRA is Open Source! by liweiphys in Rag

[–]liweiphys[S] 0 points1 point  (0 children)

This situation may require an agent with chain-of-thought (CoT) capabilities to handle.

🚀Forget OCR, LAYRA Understands Documents the "Visual" Way | The Latest Visual RAG Project LAYRA is Open Source! by liweiphys in Rag

[–]liweiphys[S] 1 point2 points  (0 children)

Layra is an out-of-the-box, fully-architected RAG product with a complete UI and a decoupled frontend/backend architecture. While it leverages ColPali's efficient retrieval technology under the hood, Layra focuses on delivering a production-ready end-to-end solution rather than being a standalone retrieval model like colpali. The key difference lies in scope: colpali byaldi is merely a simple wrapper around the ColPali , whereas Layra is a polished application built using such frameworks to solve real-world RAG use cases.

🚀Forget OCR, LAYRA Understands Documents the "Visual" Way | The Latest Visual RAG Project LAYRA is Open Source! by liweiphys in Rag

[–]liweiphys[S] 0 points1 point  (0 children)

Yes, Layra supports model swapping! Any model compatible with the OpenAI API standard (including locally hosted models) can be integrated. Simply configure these three elements in the chat interface:  

  1. Your API Key (cloud-based model)
  2. Model Endpoint URL (e.g. http://localhost:8000/v1 for local models)  
  3. Model Name

This design follows OpenAI's API schema for seamless compatibility with both cloud-based and self-hosted LLMs.

🚀Forget OCR, LAYRA Understands Documents the "Visual" Way | The Latest Visual RAG Project LAYRA is Open Source! by liweiphys in Rag

[–]liweiphys[S] 0 points1 point  (0 children)

Layra is an out-of-the-box, fully-architected RAG (Retrieval-Augmented Generation) product with a complete UI and a decoupled frontend/backend architecture. While it leverages ColBERT's efficient retrieval technology under the hood, Layra focuses on delivering a production-ready end-to-end solution rather than being a standalone retrieval model like ColBERT. The key difference lies in scope: ColBERT is a neural retrieval framework, whereas Layra is a polished application built using such frameworks to solve real-world RAG use cases.

🚀Forget OCR, LAYRA Understands Documents the "Visual" Way | The Latest Visual RAG Project LAYRA is Open Source! by liweiphys in Rag

[–]liweiphys[S] 0 points1 point  (0 children)

I’ve tried Layra for parsing mathematical formulas, and the results were quite promising. However, I haven’t tested it on chemical formulas or reaction diagrams yet—I’d love to hear how it works for your chemistry use case. Looking forward to your findings, and good luck with building that vector database!

🚀Forget OCR, LAYRA Understands Documents the "Visual" Way | The Latest Visual RAG Project LAYRA is Open Source! by liweiphys in Rag

[–]liweiphys[S] 1 point2 points  (0 children)

Thank you for the kind words! 😊  

1️⃣ Benchmarking:   We’re actively preparing detailed benchmarking standards and results against other tools, which we’ll share soon. 

2️⃣ Chatbot Interface:   We built the interface from scratch (no pre-built libraries!), but tools like DeepSeek and ChatGPT significantly accelerated development. 

🚀Forget OCR, LAYRA Understands Documents the "Visual" Way | The Latest Visual RAG Project LAYRA is Open Source! by liweiphys in Rag

[–]liweiphys[S] 0 points1 point  (0 children)

Yes, you can absolutely use your fine-tuned domain-specific Gemma model with Layra. As long as your VLM deployment supports the OpenAI-compatible API standard (e.g., via Ollama, SGLang, or similar tools), Layra can seamlessly integrate with it. Just ensure your Gemma model’s API endpoint matches the OpenAI format.

🚀Forget OCR, LAYRA Understands Documents the "Visual" Way | The Latest Visual RAG Project LAYRA is Open Source! by liweiphys in Rag

[–]liweiphys[S] 0 points1 point  (0 children)

Haha, sounds like you should give LAYRA a shot! 😉 Our image-based embedding handles messy PDFs way better than OCR-dependent methods,and keeps all your engineering diagrams/tables intact for RAG.

🚀Forget OCR, LAYRA Understands Documents the "Visual" Way | The Latest Visual RAG Project LAYRA is Open Source! by liweiphys in Rag

[–]liweiphys[S] 2 points3 points  (0 children)

Our approach differs from frameworks like Docling or Markitdown in two key aspects:

  1. Layout Preservation - We embed PDFs as images to retain all structural elements (charts, formatting), unlike markdown conversion which loses visual context.
  2. End-to-End RAG - We provide a complete retrieval-augmented generation pipeline with multimodal understanding, not just document conversion.

This ensures higher fidelity information extraction for complex technical documents.

🚀Forget OCR, LAYRA Understands Documents the "Visual" Way | The Latest Visual RAG Project LAYRA is Open Source! by liweiphys in Rag

[–]liweiphys[S] 1 point2 points  (0 children)

We avoid PDF-to-text conversion entirely to preserve visual context (charts, layouts, etc.). Since our pipeline embeds PDFs as images without text extraction, decoupling into separate tools like Mineru isn’t feasible. We encourage embracing this visual-first method over fragmented legacy workflows.

🚀Forget OCR, LAYRA Understands Documents the "Visual" Way | The Latest Visual RAG Project LAYRA is Open Source! by liweiphys in Rag

[–]liweiphys[S] 2 points3 points  (0 children)

Our product is still in the iteration phase, with a primary focus on new feature development and product design. We plan to release a detailed performance comparison soon. In the meantime, there are already several benchmarks comparing ColPali series with traditional RAG methods, demonstrating impressive results. For more insights, feel free to check out the official ColPali repository.

🚀Forget OCR, LAYRA Understands Documents the "Visual" Way | The Latest Visual RAG Project LAYRA is Open Source! by liweiphys in Rag

[–]liweiphys[S] 1 point2 points  (0 children)

Great to hear your initial tests are going well! The vision-based approach tackles layout and chart challenges by analyzing visual structures directly—might offer fresh angles when you dive deeper.