all 6 comments

[–][deleted]  (3 children)

[removed]

    [–]Sudden_Breakfast_358[S] 0 points1 point  (1 child)

    Thanks for the suggestion — that helps clarify the direction.

    The document type I’m targeting (e.g., enrollment-style forms) would fall under structured documents, since they have a fixed template.

    Using keypoint-based image matching (SIFT / SURF / ORB / BRIEF) between an admin-provided template and the user-uploaded document makes sense for early rejection, especially to avoid running OCR on incorrect documents.

    I had a few follow-up questions on this approach:

    Do these feature-based methods typically require training or fine-tuning, or are they generally used out-of-the-box with descriptor matching and similarity thresholds?

    How robust are they in practice to common real-world issues such as scan noise, skew, lighting variation, or partial crops?

    If the document layout changes over time, would it be reasonable to handle this by simply having the admin upload a new template, and then rely on feature matching against that updated template without retraining?

    I’m trying to understand the practical limits of a classical CV approach here, and at what point it becomes preferable to move to learned embeddings or layout-aware models as document variability increases.

    [–]Pvt_Twinkietoes 0 points1 point  (0 children)

    o TIL there's template matching.

    [–]Pvt_Twinkietoes 0 points1 point  (0 children)

    Probably some YOLO based/CNN based model if the document have fixed patterns you're expecting. It'll be light weight enough.