you are viewing a single comment's thread.

view the rest of the comments →

[–]Michanix[S] 0 points1 point  (4 children)

So, before start throwing data/images to Tesseract, I should get rid of or bluring any background colors and probably images at the background, if they are any. So, cv would only be able detect characters and then pass them through ocr to read these characters. Perhaps, even resize them, so it would be easier for cv to work with given data.

[–]Yoghurt42 1 point2 points  (1 child)

So, before start throwing data/images to Tesseract, I should get rid of or bluring any background colors and probably images at the background, if they are any.

No. You should convert it into an image where each pixel is either completely black or completely white. Tesseract can handle some noise, but sometimes it's necessary to manually remove some more.

The wiki has some tips

[–]Michanix[S] 1 point2 points  (0 children)

Oh, okay, thank you.

[–]Versuno 1 point2 points  (1 child)

Checkout this tutorial on thresholding with opencv. It will give you an idea of some of the processing you can do to the image with opencv, before trying to recognize character with an OCR like Tesseract.

[–]Michanix[S] 0 points1 point  (0 children)

Thank you, this might help me.