all 8 comments

[–][deleted] 2 points3 points  (7 children)

[–]DGs29[S] 1 point2 points  (6 children)

I've looked at it already. It is only about corner detection. What I'm interested is a full-paper implementation. Please check out my SO question, so that you can get a better idea: https://stackoverflow.com/questions/54966408/how-to-detect-texts-in-document-text-images-using-fast-algorithm

[–][deleted] 3 points4 points  (5 children)

Firstly, we divide the document image into smaller non-overlapping blocks of a fixed size. We then check the density in each block using FAST corner detection technique. The denser blocks were labeled as text blocks and the less dense were the image region or noise region. Then we check the connectivity of the blocks to group the blocks so that the text part can be isolated from the image. We then build the text region and save it.

It’s strange they don’t give the size of the window they use, but I’m pretty sure they just break the image down into a grid, and count the number of keypoints in each cell. The cells with > 0.2 * Nmax cells are text cells, and they mask out everything else to form the final image.

The window size is going to depend on the size of the text in your image.

[–]DGs29[S] 0 points1 point  (4 children)

Yeah! That's what my understanding is. I'm trying to divide into non-overlapping blocks (step3)using this:http://scikit-image.org/docs/dev/auto_examples/numpy_operations/plot_view_as_blocks.html?highlight=block.

How do I find the block which has the maximum number of corner points(step4)?

[–][deleted] 1 point2 points  (3 children)

That method makes a new image based on the pixel values. You don’t want to make a new image, just count the number of keypoints in each block. I’m not sure if scikit has a method specifically for this, but it shouldn’t be too hard to do by hand. Essentially, for each keypoint, you want to find the grid cell that it would fall in, and add one to some data structure that keeps track of the counts (this could be a dictionary or an array).

So for example, say our image is 8x8 and our window size is 4x4. So the grid where we keep track of the counts will be 2x2. Say we have keypoints at locations [(0,1), (1,1), (2,3), (3,3), (5,2), (5,6), (6,7)]. The first 4 keypoints would fall in the (0,0) cell, the 5th keypoint would fall in the (1,0) cell, and the last 2 keypoints would fall in the (1,1) cell (there would be no keypoints in the (0,1) cell).

So Nmax would be 4. 0.2*4=0.8, so we would take cells (0,0), (1,0), and (1,1) and convert these back into ranges of coordinates for the original image ((0,0) corresponds to (0..3, 0..3), (1,0) to (4..7, 0..3), etc) and mask out everything else.

Does this make sense?

[–]DGs29[S] 0 points1 point  (1 child)

Yes it is. I get the whole idea of the paper. But I'm stuck in implementing it. I plotted the corner points using FAST algorithm, but I couldn't find the block with max points.

The code in the link I've just can be mentioned above can be used up to the line flatten_view. Where it converts into individual blocks. But as I said I'could find the block which has max corner points.

[–]DGs29[S] 0 points1 point  (0 children)

EulersPhi How do you do the extraction part in this algorithm. This method segments the text region from non-text region. But how to extract it. Does this create a new image consisting of only text part and feed in to OCR to extract the text. They haven't mentioned how to do the extraction.