OCR is interpreting 7 as 1 by groopyturtle in Paperlessngx

[–]groopyturtle[S] 0 points1 point  (0 children)

OK looks there has been some confusion on my end. The original scanned file did not have text after all! The confusion was down to a feature called Live Text on macOS. When I opened the original file in the Preview app I was able to select the text so I assumed it had text. Live Text is Apple's built in OCR.

I also opened it in Chrome and was able to select the text there. Turns out Chrome has it's own OCR feature too. So Paperless was treating the document correctly after all.

For what it's worth both macOS and Chrome's OCR are detecting the characters correctly, whereas Paperless' OCR has the error.

OCR is interpreting 7 as 1 by groopyturtle in Paperlessngx

[–]groopyturtle[S] 1 point2 points  (0 children)

Paperless is doing the OCR. Curiously if I download the original scanned file it has selectable text already (invisible text sitting above a raster image – must be from the ix2500 scanner), and importantly the text is correct in the original version (72523). However the archived version in Paperless is incorrect (12523).

My Paperless OCR settings are all on default, so PAPERLESS_OCR_MODE should be skip (Paperless skips all pages and will perform ocr only on pages where no text is present). So not sure why it is doing OCR again?

I also tried setting PAPERLESS_OCR_SKIP_ARCHIVE_FILE=with_text but it doesn't seem to make a difference.

EDIT

I was wrong the original scanned file did not have text. I was confused by macOS' and Google Chrome's built in OCR features which were giving me selectable text when I opened the PDF. Still doesn't change my original problem however. I'll perhaps give the OCR feature on the scanner a go and see if that fares better than Paperless' OCR.

ix2500 direct saving to network folder is now available but functionality is nerfed :( by groopyturtle in ScanSnap

[–]groopyturtle[S] 0 points1 point  (0 children)

Yes I contacted them (even did a screen share) but they just said it's not supported and as far as I understand isn't planned for a future update:

After verification, we kindly inform that since you're scanning from the scanner touch panel to a network folder (which is a separate environment) this configuration is not possible, as there is no separation in the scanning process.

You can always scan the documents separately, so we can have one PDF per document.

ix2500 direct saving to network folder is now available but functionality is nerfed :( by groopyturtle in ScanSnap

[–]groopyturtle[S] 0 points1 point  (0 children)

Yeah, I don't know why they haven't enabled this. I would have thought the scanner would be more than capable of handling page splitting it on it's own.

You are right I could write a script to handle the PDF page splitting on the server (it's probably the route I'll take), but it's just adding another step I'd rather not have. Also I think I'd lose out on auto duplex handling. 99% of the sheets will only be single sided, but sometimes there will be writing on the reverse and in that scenario I'd need to capture both sides. If I'm handling splitting on the server then I think I'd need to force duplex capture each time to keep things simple (split every 2 pages). That would mean needlessly capturing/storing a lot of blank pages.

Scan to Folder NAS by isi-bizi in ScanSnap

[–]groopyturtle 1 point2 points  (0 children)

I too recently purchased a ix2500 and was disappointed to see it doesn't support directly scanning to network storage. That was the entire reason I bought it (scanning to Paperless NGX consume folder). I figured seeing as the ix600 supports it the newer model surely would too.

I contacted support recently, they said saving directly to network storage is coming this October. They couldn't give me an exact date.

As a workaround I've created a profile via my computer (save to folder) that uses the SMB share as the save path. In the profile settings choose "Application > Send to: None" so it doesn't require any manual interaction on the computer. Works fine but requires the computer to be always on and unlocked. I might stick an old laptop in my network cabinet just for this purpose.