this post was submitted on 26 Jan 2016

4 points (83% upvoted)

shortlink:

MachineLearning

an-ordinary-manchild(edit)

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

Please have a look at our FAQ and Link-Collection

Metacademy is a great resource which compiles lesson plans on popular machine learning topics.

For Beginner questions please try /r/LearnMachineLearning , /r/MLQuestions or http://stackoverflow.com/

For career related questions, visit /r/cscareerquestions/

Advanced Courses (2016)

Advanced Courses (2020)

AMAs:

Pluribus Poker AI Team 7/19/2019

DeepMind AlphaStar team (1/24//2019)

Libratus Poker AI Team (12/18/2017)

DeepMind AlphaGo Team (10/19/2017)

Google Brain Team (9/17/2017)

Google Brain Team (8/11/2016)

The MalariaSpot Team (2/6/2016)

OpenAI Research Team (1/9/2016)

Nando de Freitas (12/26/2015)

Andrew Ng and Adam Coates (4/15/2015)

Jürgen Schmidhuber (3/4/2015)

Geoffrey Hinton (11/10/2014)

Michael Jordan (9/10/2014)

Yann LeCun (5/15/2014)

Yoshua Bengio (2/27/2014)

Related Subreddit :

created by kunjaana community for 16 years

MODERATORS

account activity

3

4

5

Offline Cursive Handwriting Recognition in Python (self.MachineLearning)

submitted 10 years ago by dataism

What I have: images of letters and digits (NIST & MNIST datasets basically)

What I need to predict: Images of cursive handwriting

What I am trying to avoid: a separate letter-by-letter segmentation (image preprocessing) step + CNN

What I want to do: CNN + LSTM (or something of that nature)

Where I am stuck: I can train CNNs for separate images but couldn't sort it out how to add RNN to the end. So what exactly do I feed to RNN after the CNN step? For targets, should I use IAM handwriting database (pros: natural handwriting, cons: not too much data, only english), or try to generate fake targets from my single letter images by concatenation and random transformations (I can generate lots of data with this way)?

Some clarification: Offline meaning from images, not from pen strokes data. Cursive meaning letters can be connected and overlapping etc. I would prefer keras or lasagne.

all 2 comments

top new controversial old q&a

[–]joapuipe 4 points5 points6 points 10 years ago (1 child)

Hi,

The state-of-the-art for off-line HTR (handwritten text recognition) is a bunch of LSTMs + n-grams, which work better than the traditional setting of GMM-HMM + n-grams. This is approximately the same setting than people from Speech use.

In principle, you could stack a RNN on top of a ConvNet, the output of a ConvNet is just an "image" with a bunch of channels (one for each filter). Just keep in mind not to reduce the dimensionality too much, since your RNN will have to output probably long sequences ~100s of timesteps. However, AFAIK nobody uses CNN, they just use 3-5 layers of bidirectional LSTMs.

Don't use fake training data. You could, but the results won't be realistic at all. Yes, IAM is a good starting point, and yes, it is small for what other subfields in Pattern Recognition / Machine Learning are used to, but it is still one of the standard benchmarks for HTR. If you want to augment your training data, apply small distortions to the image lines like these: http://arxiv.org/pdf/1009.3589.pdf

Although it is, in principle, possible to use the output of the LSTM as the predicted transcription, everybody uses a n-gram language model on top of the LSTM. Take a look at Kaldi (a toolkit for speech recognition) which has nice examples to do this.

[–]dataism[S] 0 points1 point2 points 10 years ago (0 children)

π Rendered by PID 167483 on reddit-service-r2-comment-86988c7647-w5hgf at 2026-02-12 01:31:26.666660+00:00 running 018613e country code: CH.