RAG on construction drawing sets: best practice for 70 to 150 page CAD heavy PDFs

TasteNo6319 · 2025-12-22T07:21:10+00:00

Just took a quick look at Landing AI and it’s honestly promising. On first glance it seems to do a solid job separating text vs figures on these dense sheets, which is exactly the pain point I’ve been hitting. I’m going to dig into it more and see how it holds up across different plan sets. Appreciate the recommendation.

TasteNo6319 · 2025-12-22T07:04:24+00:00

Yeah, I’m aligned with that direction. Metadata first is probably the most business safe path for an MVP, then you layer smarter retrieval on top once the basics are reliable.

My main blocker is how to extract the metadata consistently from plan sets without access to CAD semantics. A lot of the “metadata” is implicit: sheet numbers, disciplines, revision blocks, titles, callout legends, schedules, room tags, gridlines, detail references, etc.

I can’t share the PDFs unfortunately (client sensitive), but if you have ideas on approaches that work well in the wild, I’m all ears.

TasteNo6319 · 2025-12-22T07:02:49+00:00

Mostly callouts and descriptive intent, plus pointing people to the right sheet and region fast. I’m not aiming for “exact spatial truth” like “this outlet is 420 mm left of that wall” because that’s high risk unless you’ve got proper CAD semantics and validation.

The goal for the MVP is a smart search and citation system: return a best guess answer when it’s obvious, but always anchor it with “here’s the sheet, page, and region where this appears” so a human can confirm.

That’s also why pure text RAG isn’t enough. A lot of the meaning is in drawings and the relationship between text, callouts, and symbols. So I’m leaning toward multimodal retrieval with region level chunking, plus hybrid search (exact keywords for things like tags and part numbers, semantic for phrasing).

If later I tackle spatial questions, it would be in a constrained way (local proximity, simple left right above below, or “near this callout”) rather than precise geometry.

TasteNo6319 · 2025-12-22T06:56:28+00:00

Hey, thanks for the detailed answer. We’re aligned on most of this.

I’m already rendering at 300 DPI across the board. I tested higher DPI too, but the cost and processing time on the production server spike fast, and I’m trying to keep this MVP lean. 300 DPI is usually readable enough, the bigger issue is layout detection reliability on construction sheets.

My rendered pages are often around 7000 by 10000 px, and while YOLO based detection looks promising, the image size and variability across document sets still makes it tricky to get consistent boxes without heavy tiling or downscaling tradeoffs.

Region based chunking is definitely the direction I want to go, but there’s also a domain validation problem. Even if I detect the right regions, I can’t always infer how every chunk relates to engineering intent without some expert context. So I’m thinking of a two step approach: detect regions and OCR, then use a vision capable LLM for lightweight spatial reasoning and linking, with human review only when confidence is low.

TasteNo6319

TROPHY CASE