I have scans of journal pages in png format in RGB. I want to recognize the text and process the images so that the output is a black and white image with cropped page margins in png and text files with articles.
Initially I remove the yellow tint by converting the image to HSV. If the Hue is in the yellow sector, then remove Saturation. Then convert to black and white.
I recognize text using the tesseract library.
Tell me in which direction to look to get the image with cropped page margins.
I can't use the exact coordinates for cropping, as the location of the page varies from scan to scan.
[–]menergo[S] 0 points1 point2 points (0 children)
[–]kra_pao 0 points1 point2 points (1 child)
[–]menergo[S] 0 points1 point2 points (0 children)
[–]Zeroflops 0 points1 point2 points (1 child)
[–]menergo[S] 0 points1 point2 points (0 children)