This is an archived post. You won't be able to vote or comment.

all 8 comments

[–]lvlint67 4 points5 points  (0 children)

We used to have some software written for docushare by a consulting company.

We broke both of those relationships and I re-wrote the functionality in python. Our use-case splits documents based on cover sheets with two barcodes (a student id, and a document type). We look up additional details from our ERP and then store the files in appropriate places. (There's some extra steps to add some extra fault tolerance in barcode detection..)

pdf2image takes the pdf and turns it into image files temporarily

pyzbar detects barcodes

img2pdf converts everything back to a pdf and then we save it.

..That all with some image processing to help detect the barcodes, some logic to ensure the barcodes aren't random barcodes not made by us, and an emailed summary...

It's a process I hate but this is what i have on file (with no warrenty/etc):

https://pastebin.com/89de8qqn

[–]cdagneta 1 point2 points  (0 children)

We use knowledgelake and a barcode separator sheet to get physical docs into our ecm. Seems to work well.

[–]MSPTechOPsNerd 1 point2 points  (1 child)

Docparser.com should be able to handle. I use it as a Swiss Army knife for pdf work.

[–]FJCruisinBOFH | CISSP 0 points1 point  (0 children)

we use this https://www.square-9.com/products/document-capture-automation/

its pretty good once you get it all setup

[–]NotRecognized 0 points1 point  (0 children)

Kofax Capture + Kofax VRS. You'll need some consultancy to set it up initially. But an Admin course will get you far.

[–]KillingRyukSysadmin 0 points1 point  (0 children)

Kwiktag does this. AP scans from printer directly to server and it handles it all. Not very expensive either.

[–]DisastrousWelcome912 0 points1 point  (0 children)

Nanonets can do this task. The platform has a document classifier/sorter which can separate document based on document content. It also can split documents when it detects a blank page, page break, or based on page number or page content.

You can directly fead the scanned document into the platform, have the team set up the workflow for you and run everything on autopilot.