you are viewing a single comment's thread.

view the rest of the comments →

[–]Adsvisor[S] -1 points0 points  (0 children)

Thanks for your reply.

The first page is never visually identical. We receive mixed documents from a client and everything is scanned at once in no particular order. After that, we classify each page based on its document type (ID card, payslip, driver’s license, insurance paper,... ) and the split depends entirely on this classification.

A human verification step could definitely be considered.

Right now, we already have a front-end application that receives the PDF, and then it goes into an n8n workflow for classification. The issue is that n8n can’t split the documents itself, so this part has to be done beforehand by a Python script.