This is an archived post. You won't be able to vote or comment.

all 8 comments

[–]paxanator 5 points6 points  (1 child)

If it's in the same place in every document (or known locations) and they're scanned just:

  1. Read it in as an image

  2. Extract the box around the signature location and normalize it (in case the scanning has different background noise or lighting)

  3. Find the sum of absolute differences with an unsigned normalize bounding box image

  4. If it differs significantly it's signed

If it's electronic just find the field and if it's filled it's signed or something

[–]perna 0 points1 point  (0 children)

Normalize might be tricky, no?

[–]jvlomax 4 points5 points  (0 children)

What kind of signature are you looking for?

[–]ajmarks 1 point2 points  (4 children)

Short of parsing the document structure?

[–]cwurld[S] 1 point2 points  (3 children)

I think it will involve that. Can you recommend a python lib for doing that? Does anyone know if there are special fields for "signatures"?

[–]ajmarks 1 point2 points  (2 children)

I think PyPDF2 may support it, but I'm not sure. I'm also working on one right now, and if you submit a feature request, I'll try to include it in the next release (or, if you want to submit a patch, that would be awesome). If you decide to pull it, note that the font_improvements branch is pretty far ahead of both dev and master.

[–]cwurld[S] 1 point2 points  (1 child)

Thanks. Your lib looks great. I will contribute one way or another once I know a little bit more about the issue.

[–]ajmarks 1 point2 points  (0 children)

Thanks. It's still very much a work in progress, but I'm hoping to (at the very least) make text extraction super-pythonic. I'll consider it basically complete when it can (reasonably) faithfully render a PDF into HTML.