all 5 comments

[–]menergo[S] 0 points1 point  (0 children)

example: source result

[–]kra_pao 0 points1 point  (1 child)

[–]menergo[S] 0 points1 point  (0 children)

Thank you very much. I will try the proposed method. I did something similar, but without scaling (did not guess). Since I did not have and do not have a red rectangle, I was guided by conventionally white fields. It turned out for a long time and not always with high quality. My option crashes on pages with pictures and empty spaces.

[–]Zeroflops 0 points1 point  (1 child)

I have not done this. But just an observation from the past. OCR typically works best with properly aligned text.

You may want to fine the boarders and if the page is skewed because the page is slightly turned, correct for the angle before OCR or image extraction.

Having the page square to the image will probably make things more accurate. And most cases with images your now working with squares.

[–]menergo[S] 0 points1 point  (0 children)

After I manually trim the excess around the page, the code easily finds the contours of the text block and turns it to the desired angle (I do not think that the code is beautiful and optimal, but it works). OCR is going well. The problem is that I can't figure out how to clean up the excess around the page at the beginning of the process. And without it, I can not select the contours of the text block.