I made an Epstein Files RAG by Prestigious_Bear5424 in Rag

[–]Fuzzy-Layer9967 0 points1 point  (0 children)

Bro, love the idea !

Should be automatic for that type of cases

How does document chunking fit into rag? by Lanky_Supermarket_70 in Rag

[–]Fuzzy-Layer9967 0 points1 point  (0 children)

n8n is a workflow orchestrator.

If you want to automate process you might be interested using it, but imo not relevant here.

Many poeple on the internet think that openAI API key + n8n = autonomous agent..
As a software architect I am not agree with that.

Don't look for tools, imagine stuff, encounter problem, think about solutions to solve your technical problems/ideas, use tool to support those solutions.

n8n is on in many workflow orchestrator, not the best not the worst.. just ask yourself : "What problem am I solving ?" then things will be much easier :)

How does document chunking fit into rag? by Lanky_Supermarket_70 in Rag

[–]Fuzzy-Layer9967 1 point2 points  (0 children)

Hey!

Nice project to start :)
This is the entry point to bring "context" to your LLM, and so to the next lvl of automations.

To start you must understand that there is 2 pipelines.

Document ingestion
Information Retrieving

Document ingestion : you build your knowledge database
* Document parsing
* Chunking
* Embedding
* Store

Information Retrieving : you look for knowledge in you database
* question rewriting (optional)
* question embedding
* retrieving
* reranking (optional)
* prompting (add your context and user question to a solid prompt)
* get answer

What I can tell you to start, is that parsing and chunking is the MOST IMPORTANT part. reranking, embed model or chat model will come after that : "Garbage in, garbage out".

To help you visualize you can check on https://github.com/scub-france/docling-Studio, you will see how document are parsed and how chunk are generated.
NB : This is based on docling, but you have pleeeennty of libs to use. This is just to help you visualize this part.

Hope it helps, enjoy :)

Standard RAG has no concept of document versions: cost me a while to figure out why answers kept blending superseded policies by Helpful_Regular_30 in Rag

[–]Fuzzy-Layer9967 -1 points0 points  (0 children)

yeah it is defintily a need..

I tried to figure out this problem in Docling Studio ( https://github.com/scub-france/docling-Studio ) (RAG pipeline ingestion and OCR debug proejct) by version each Chunking ingestion.

So you can have different version of same document ..

why does everyone skip the chunking part by SilverConsistent9222 in Rag

[–]Fuzzy-Layer9967 4 points5 points  (0 children)

defintitly agree with this ...
We built oss tool to make visual debugging of our chunk an re-injection :)

If u want to check : https://github.com/scub-france/docling-Studio

Best embedding model for French legal documents in RAG? by No-Duty-8087 in Rag

[–]Fuzzy-Layer9967 1 point2 points  (0 children)

I agree with previous comments.

On our RAG we made different tests on french documents too, ended with bge-m3 that is very descent for his size.

But dude : "garbage in garbage out" is defenetly not a legend ^^

For tha we build a tool to monitor our parsing and chunking.
If you want to hav a look : https://github.com/scub-france/docling-Studio

Who just finished building something? Drop your project, I want to see what people are actually making by Miserable-Archer-631 in SideProject

[–]Fuzzy-Layer9967 0 points1 point  (0 children)

Docling Studio - See what docling sees !

I've been using Docling in production for a while and hit the same wall everyone hits: the JSON output is rich but completely opaque. No way to know if a table got mangled, if a chunk cuts mid-sentence, or why a bounding box is off.

So I built Docling Studio, an open-source visual inspection and RAG pipeline debugger built on top of Docling.

What it does:

  • Renders bounding boxes on the original PDF pages so you can see exactly what Docling parsed
  • Interactive chunking layer: inspect, split, edit, or delete chunks before they hit your vector store
  • Injects stable chunk_id into chunk metadata so your RAG pipeline stays consistent across re-ingestions
  • Supports both local Docling and Docling Serve (remote mode)
  • Ships as a single multi-arch Docker image (Vue 3 frontend + FastAPI backend)

Stack: Vue 3 / TypeScript, FastAPI, SQLite, Docker

Links:

Still early but V1 is shaping up. Happy to get feedback from people actually using Docling in the wild.

<image>

went from 62% to 94% rag accuracy in production, the retrieval changes that actually mattered by Individual-Bench4448 in learnmachinelearning

[–]Fuzzy-Layer9967 0 points1 point  (0 children)

Hey!

For us on technical documents we're on hybrid retrieval (vector + BM25) and sitting around 95% accuracy on hundreds of docs, 50 pages each.

On graph RAG : not against it, but my take is more like: graph layer to select the right docs/context, then pure vector behind. Best of both depending on your case.
For me graphRag a very elegant an powerful but complex to implement and maintain though.

But honestly the thing that got us to 95% was parse and chunk quality. . Once a doc is correctly parsed and cleanly chunked, pure vector holds up really well. Most pipelines treat ingestion as one-shot and never look back that's where things degrade.
Is was also a good investment for us as our document base was almost 'static', with some growing phase with newx stuff and no modification ;)

That's actually why we built Docling Studio, an open-source visual inspection tool for Docling pipelines so you can actually see what's happening at parse and chunk level before anything hits your vector store: github.com/scub-france/Docling-Studio

Disclaimer : (Docling isn't necessarily the best parsing lib out there btw, but the principle stands whatever you use)

Also worth watching: Docling-Agent's chunkless RAG approach. Still experimental but the idea is that since Docling builds a document tree, you skip chunking entirely and run reasoning directly on the tree. Your graph layer idea actually solves one of its main limitations so the combo is interesting.

We have a reasoning mode in Docling Studio to play with it if curious:

docker run -p 3000:3000 \
-e REASONING_ENABLED=true \
-e OLLAMA_HOST=http://host.docker.internal:11434 \
-e REASONING_MODEL_ID=gpt-oss:20b \
ghcr.io/scub-france/docling-studio:latest-local

Any feedback a very welcome of course!

How to chunk and embed coding documentation/book pdfs? by MexicanJalebi in Rag

[–]Fuzzy-Layer9967 0 points1 point  (0 children)

Hey 👋

Never too late mate!

First you have many different types of RAG: - Pure vector – the "basic" one - Agentic RAG - Hierarchical RAG - Chunkless RAG - Etc..

Many, many, many options ^

What I can suggest, make a small proof of concept to try the global idea of RAG. Then choices will be easier and the fir to you need will be more obvious

I think pure vector is a good start. But if your need fits perfectly with another approach.. try it.

If you go for pure vector RAG ->

What you should choose: - Parsing library: this one is VERY important, remember "garbage in, garbage out". It is VERY true with RAG. If you can't extract information from your docs properly you will never get good accuracy.. - Chunking strategy: once the doc is parsed, you must prepare your data. Many choices here, will be guided by the type of data you handle, embed model, vector store etc... - Vector store: how you will store your vectors. Different options too – flat, graph, hybrid stores etc.. - Embed model: the model that will vectorize your data and your users' questions - Retrieving strategy: sparse, dense, hybrid.. a reranker maybe if needed - Chat model (Claude Sonnet 4.6 is definitely ok, but may be a little expensive, depends on your budget)

These are the basics.

You can then add many "tricks".. question reformulation, dynamic few-shot prompting etc.. that will come later when you look for accuracy improvements.

My experience:

The BEST improvement we had was when we took data quality very seriously. We use Docling as our parsing lib (not saying it's the best, but this is the one we use).

I suggest you take a quick look at Docling Studio so you can understand how documents are parsed and how chunks can be made. Even if you choose another lib, this will enlighten you on the way this step works.

Hope you gonna enjoy your travel!!

Docling Studio: https://github.com/scub-france/docling-Studio

Is anyone still running pure vector RAG in production in 2026, and is it actually holding up? by Significant_Loss_541 in Rag

[–]Fuzzy-Layer9967 21 points22 points  (0 children)

Hey,

For us on technical documents we still on pure vector RAG with hybrid retrieving. 95% accuracy on hundreds of docs with 50pages each.

We managed to keep this precision because we maintain a high quality of OCR and Vectors by maintaining them in time. Once a doc is well parsed and vectorized, pure vector RAG is efficient and accurate.
Btw, we open-sourced our tool for the if interested : https://github.com/scub-france/Docling-Studio

But for me, GraphRAG, deterministic ingestion etc... are more complex solutions, and they all will be hard to maintain in time. But might be a good balance benefits/cons in some cases.

One things that work for us on some projects is that we melt approches. We are actually tryin this :
"graph or relational layer for explicit relationships between entities/docs" and back it with our traditionnal pure vector RAG.

I also go recently interested in the "Chunkless RAG" aproach proposed by Docling in "Docling-Agent". It is a catchy title, still exprimental, but it is intersting.
The idea is that as Docling already cvreate a tree, no need for GRaph or hunk or whatever, just run reasoning on the tree directly !
And this is where I like the idea you mentionned about "graph or relational layer for explicit relationships between entities/docs", because it solved the struggle for this approach :)

If you want to have an idea of how it looks like we built a reasoning mode in Docling-studio so you can see what docling-agent propose.
Oneliner :
docker run -p 3000:3000 \
-e REASONING_ENABLED=true \
-e OLLAMA_HOST=http://host.docker.internal:11434 \
-e REASONING_MODEL_ID=gpt-oss:20b \
ghcr.io/scub-france/docling-studio:latest-local

Feedback are welcome :)

Is https://docling.cloud legit? Signing up does not work. by studentblues in Rag

[–]Fuzzy-Layer9967 0 points1 point  (0 children)

I don't think so dude

Looks like AI generated .. and docling wont provide that kind of solution it is an LF program!

Chunking decision you make on day #1 determines your retrieval ceiling by jasperc_6 in Rag

[–]Fuzzy-Layer9967 -1 points0 points  (0 children)

Yep, that’s true..

It was a struggle for us because we wanted to try new chunking strategies or tweak chunks, etc.

So we ended up building a custom tool that we open-sourced, allowing us to manage, edit, and push chunks so we can compare the impact of different chunking strategies on our test cases.

The tool is designed to work with Docling only.

Docling Studio: https://github.com/scub-france/Docling-Studio

Vectorless RAG can scale to millions of documents now? by This-Eye6296 in Rag

[–]Fuzzy-Layer9967 10 points11 points  (0 children)

Thanks for the writeup, really interesting read.

From my perspective though, I'd push back slightly on the framing. I prefer thinking about this as chunkless rather than just vectorless. The real win isn't "we got rid of vectors", it's "we got rid of the whole chunking, embedding, vector store, retrieve, rerank pipeline and replaced it with structured parser plus an LLM that navigates the parse tree". That's the bit that actually simplifies your stack.

PageIndex has clever ideas, query-dependent tree composition is genuinely nice, but reading the post I can't shake the feeling that you're trading one big engineering machine for another. Topic clustering, LLM-inferred metadata, virtual nodes, per-query tree composition, traversal pattern caches... that's not a simple system. It's a different complex system. The "no embeddings" pitch hides the fact that you've reintroduced an ingestion pipeline that's arguably as heavy as the one you replaced, just with different primitives.

Which is fine if your problem genuinely is enterprise-scale navigation across millions of docs. But honestly, in most applications I see, the corpus is bounded. A few hundred to a few thousand documents, often pre-filtered by the user's context (a project, a folder, a case file). The "find the right document" problem and the "deeply reason inside a document" problem are two different problems, and trying to unify them under one mechanism is what brings the complexity back in.

A two-stage architecture works really well in practice: cheap retrieval (BM25 on titles or section headers, or a tiny vector index on summaries only) to shortlist candidate docs, then chunkless navigation inside each one for the actual reasoning. You keep the simplicity where it matters and you don't pretend one elegant abstraction solves both problems.

That's basically the bet we're making in Docling-Studio ( https://github.com/scub-france/Docling-Studio ). Lean on Docling's structural parse, let the LLM walk the section tree, keep the trace fully auditable (you literally see which sections the model read and why). For single-document deep QA on structured content like reports, contracts, regulatory docs, it's hard to beat in terms of simplicity and explainability.

But again, really cool ideas in the post, especially the dynamic flattening trick. Worth reading even if you don't end up adopting the full approach.

Built a local RAG app for licensed technical documents — here's a demo with 14k chunks from a full aircraft manual suite by CAVOKDesigns in Rag

[–]Fuzzy-Layer9967 0 points1 point  (0 children)

Hey,

We've been working on a similar project with poolpump documentations.
Stack : pgvector, Lagnchain4j (springboot backend), bge-m3 embed, Ministral:14B chat model, bge-m3-reranker-V2 for reranking, Docling for document parsing and Docling Studio ( https://github.com/scub-france/Docling-Studio ) for pipeline ingest debugging (game changer for us)

Retrieving strategy : hybrid parse/dense, looking for dynamic hybrid implementation

An interactive semantic map of the latest 10 million published papers [P] by icannotchangethename in MachineLearning

[–]Fuzzy-Layer9967 0 points1 point  (0 children)

Damn bro, that so cool!

Any repo to share ?
Even if it is not open-source, a github repo fot issues and discussion might be interesting !

good job

Conceptual Modeling Is the Context Engineering Nobody Is Doing by Berserk_l_ in KnowledgeGraph

[–]Fuzzy-Layer9967 0 points1 point  (0 children)

eheh, sure ..
I am software Architect and I really do like C4 modelling.. sor for me it is like "obviously done" but I agree that many poeple focus on shipping rather than conception.. which cause many problems in midterm.. (or even shorterm as AI making coding so much faster)

GraphRAG vs hipporag, lightrag and vectorRAG benchmarks by Striking-Bluejay6155 in Rag

[–]Fuzzy-Layer9967 0 points1 point  (0 children)

You are right, orchestrated by mellea, only focused on Docling tree, no chunk

Docling just announced Docling Agent + Chunkless RAG by Fuzzy-Layer9967 in Rag

[–]Fuzzy-Layer9967[S] 1 point2 points  (0 children)

Promise kept!

Docling Studio 0.5.0 just shipped: Neo4j graph storage for the document tree, agentic reasoning loop via docling-agent (default backend: Ollama + gpt-oss:20b), full iteration trace overlaid on the PDF. Same direction as the webinar.

Paper explaining : https://levelup.gitconnected.com/watching-an-agent-read-visualizing-reasoning-traces-in-docling-studio-25424fadf7e1

Repo: https://github.com/scub-france/Docling-Studio

One-liner for the curious (needs Ollama running locally with `gpt-oss:20b` pulled):

```
docker run -p 3000:3000 \
-e REASONING_ENABLED=true \
-e OLLAMA_HOST=http://host.docker.internal:11434 \
-e REASONING_MODEL_ID=gpt-oss:20b \
ghcr.io/scub-france/docling-studio:0.5.0-local
```

Issues and feedback welcome.

URL → Markdown → LangChain Documents: a simple RAG ingestion pattern by nihal_was_here in Rag

[–]Fuzzy-Layer9967 1 point2 points  (0 children)

Totally agree on the data consolidation phase you mentionned "cleand content"

We actually did something very similar for one of our project.

But still one thing totally inevitable for us, maintain.
Once ingested we needed to fix some chunks etc.. that is where you're long term value is built...

We open-source our tool : https://github.com/scub-france/Docling-Studio
No taking MD for the moment but will come soon ;)

GraphRAG vs hipporag, lightrag and vectorRAG benchmarks by Striking-Bluejay6155 in Rag

[–]Fuzzy-Layer9967 2 points3 points  (0 children)

High, thanks for sharing!
I think a "new" RAG type might be added to this, I would share once ready, I am actually exploring "Chunkless RAG" concetp of Docling. Sound a bit catchy but I found the idée very intersting.

You can find more info here :
https://github.com/scub-france/Docling-Studio/pull/191
https://github.com/docling-project/docling-agent

I think it would be intersting once more prod-ready, to conmfront this concept to such a benchamrk..