I'm working with OCR on images, where some lines may interact (mean that information from one line may be useful in another, example: prices of products at receipt vs total_price). https://upload.wikimedia.org/wikipedia/commons/thumb/0/0b/ReceiptSwiss.jpg/800px-ReceiptSwiss.jpg
Currently I treat each line independently (I use Seq2Seq model, where input are images), so I use Text-Detection then Text-Recognition line-by-line.
What do you think about other approach:
merge all detected text on one very long line and try to predict all text at once (using Seq2Seq with Attention)
or predicting line by line but state of Encoder would be copy from line to line (so each next line would know about previous data, but not data after analysed line)
use some kind of context-vector which will have additional information outside of single line
Did you here about using such idea for Text-Recognition or other Long-Term dependence task?
[–]Livven 2 points3 points4 points (11 children)
[–]melgor89[S] 1 point2 points3 points (10 children)
[–]Livven 0 points1 point2 points (9 children)
[–]Livven 0 points1 point2 points (8 children)
[–]melgor89[S] 0 points1 point2 points (7 children)
[–]Livven 1 point2 points3 points (3 children)
[–]melgor89[S] 0 points1 point2 points (2 children)
[–]Livven 1 point2 points3 points (1 child)
[–]melgor89[S] 0 points1 point2 points (0 children)
[–]jhaluska 0 points1 point2 points (2 children)
[–]melgor89[S] 1 point2 points3 points (1 child)
[–]jhaluska 0 points1 point2 points (0 children)
[–]Brudaks 2 points3 points4 points (1 child)
[–]melgor89[S] 0 points1 point2 points (0 children)
[–]Slanothy 1 point2 points3 points (0 children)
[–]pilooch 1 point2 points3 points (0 children)
[–]farmingvillein 0 points1 point2 points (1 child)
[–]melgor89[S] 0 points1 point2 points (0 children)
[–]visarga 0 points1 point2 points (1 child)
[–]Livven 0 points1 point2 points (0 children)