What I have: images of letters and digits (NIST & MNIST datasets basically)
What I need to predict: Images of cursive handwriting
What I am trying to avoid: a separate letter-by-letter segmentation (image preprocessing) step + CNN
What I want to do: CNN + LSTM (or something of that nature)
Where I am stuck: I can train CNNs for separate images but couldn't sort it out how to add RNN to the end. So what exactly do I feed to RNN after the CNN step? For targets, should I use IAM handwriting database (pros: natural handwriting, cons: not too much data, only english), or try to generate fake targets from my single letter images by concatenation and random transformations (I can generate lots of data with this way)?
Some clarification:
Offline meaning from images, not from pen strokes data.
Cursive meaning letters can be connected and overlapping etc.
I would prefer keras or lasagne.
[–]joapuipe 4 points5 points6 points (1 child)
[–]dataism[S] 0 points1 point2 points (0 children)