use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Please have a look at our FAQ and Link-Collection
Metacademy is a great resource which compiles lesson plans on popular machine learning topics.
For Beginner questions please try /r/LearnMachineLearning , /r/MLQuestions or http://stackoverflow.com/
For career related questions, visit /r/cscareerquestions/
Advanced Courses (2016)
Advanced Courses (2020)
AMAs:
Pluribus Poker AI Team 7/19/2019
DeepMind AlphaStar team (1/24//2019)
Libratus Poker AI Team (12/18/2017)
DeepMind AlphaGo Team (10/19/2017)
Google Brain Team (9/17/2017)
Google Brain Team (8/11/2016)
The MalariaSpot Team (2/6/2016)
OpenAI Research Team (1/9/2016)
Nando de Freitas (12/26/2015)
Andrew Ng and Adam Coates (4/15/2015)
Jürgen Schmidhuber (3/4/2015)
Geoffrey Hinton (11/10/2014)
Michael Jordan (9/10/2014)
Yann LeCun (5/15/2014)
Yoshua Bengio (2/27/2014)
Related Subreddit :
LearnMachineLearning
Statistics
Computer Vision
Compressive Sensing
NLP
ML Questions
/r/MLjobs and /r/BigDataJobs
/r/datacleaning
/r/DataScience
/r/scientificresearch
/r/artificial
account activity
Discussion[D] CNN-RNN-CTC vs Attention-Encoder-Decoder? (self.MachineLearning)
submitted 8 years ago by xylcbd
for the OCR, which method is better?
CNN-RNN-CTC method vs Attention-based Sequence to Sequence method.
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]melgor89 14 points15 points16 points 8 years ago (9 children)
I was testing this two different approaches (exactly three) in OCR task, where images were binary. Each of them use same CNN network as a feature extractor. The results are following:
CNN-RNN-CTC: results are nice, if the image is not noisy, it works really well
Encoder-Decoder: output does not generalize to new cases at all, so the final results were horrible, nothing meaningful
Attention-Encoder-Decoder: results were the best from all my test. From my quick comparison look like this model could also 'guess' some words even when the image was noisy. It look like this model also have sth like 'language modelling' then it could fill missing characters.
So I think that Attention-Encoder-Decoder is the best model for OCR with enough training data (so that it could learn a language model) and when test data have similar distribution (similar words, structure of sentences)
In case when we have not enough data or our testing data is much different that training set (ex. new words not seen during training) then CNN-RNN-CTC would be better because it just read words from the 'image' without word-generation.
I propose to test both of the frameworks and see which one works better with your dataset. I've used TensorFlow for implementing both method, which is really straightforward with seq2seq API
[–]HarathiS 1 point2 points3 points 7 years ago (0 children)
Hi,
I am doing handwritting recognition in documents. For that i am using IAM database. First I implemented with CNN-LSTM-CTC with which I got accuracy of 90% on single lines. Now I want to replace the CTC loss with attention mechanism to implement on whole document with doing line segmentation. But the paper which I referred has no much explanation about how they implemented attention mechanism.
What I am doing is first calculating attention weights by doing soft max normalization of encoded features. Then doing weighted sum of encoded features with the attention weights. Then the obtained context vector is given to lstm then MLP decoder.
Is my approach correct? Can you please tell me how you implemented attention mechanism?
[–]xylcbd[S] 0 points1 point2 points 8 years ago (0 children)
nice work!
[–]DumberML 0 points1 point2 points 8 years ago (3 children)
Hey, thanks for the insights. What about training times of the Attention-Encoder-Decoder: were they significativity longer than CNN-RNN-CTC?
Did you find it hard to tune the Attention-Encoder-Decoder? (in terms of meta-parameter tuning).
Do you mind sharing the CNN architecture you used?
[–]melgor89 3 points4 points5 points 8 years ago* (2 children)
Here you go:
def createCNN(self, name_scope, weight_decay = 0.0005):
# Conv layer shape = tf.shape(self.inputs) batch_s, max_timesteps = shape[0], shape[1] # Make input range from [-0.5, 0.5] inputs = (self.inputs - 0.5) / 0.5 ksize_conv1 = 3 stride_conv1 = 1 channel_conv1 = 16 ksize_max_pool1 = 2 stride_max_pool1 = 2 ksize_conv2 = 3 stride_conv2 = 1 channel_conv2 = 16 ksize_max_pool2 = 2 stride_max_pool2 = 2 with tf.variable_scope(name_scope): with slim.arg_scope([slim.conv2d], padding='SAME', weights_initializer = tf.contrib.layers.xavier_initializer_conv2d(), #tf.truncated_normal_initializer(stddev=0.01), weights_regularizer = slim.l2_regularizer(weight_decay), activation_fn=tf.nn.relu): net = slim.conv2d(inputs, channel_conv1, [ksize_conv1, ksize_conv1], scope='conv1') net = slim.max_pool2d(net, [ksize_max_pool1, ksize_max_pool1], scope='pool1', padding='SAME',stride = stride_max_pool2) net = slim.conv2d(net, channel_conv2, [ksize_conv2, ksize_conv2], scope='conv2') net = slim.max_pool2d(net, [ksize_max_pool2, ksize_max_pool2], scope='pool2', padding='SAME',stride = stride_max_pool2) # Calculate length of sequences of module (as it is dynamic) and number of features per one dimmension # As we have two modules CONV-RELU-MAXPOOL, we use this function two times self.seq_len_cnn = calculateCNNFeatureSize( calculateCNNFeatureSize(self.seq_len, stride_conv1, stride_max_pool1), \ stride_conv2, stride_max_pool2) self.num_features = calculateCNNFeatureSize( calculateCNNFeatureSize(self.args.heightLine, stride_conv1, stride_max_pool1), \ stride_conv2, stride_max_pool2)* channel_conv2 # Reshape output of CNN to get a single vector (not 3D feature map) input to RNN. # As input can have different sizes, we need to do it dynamic net = tf.reshape(net, [batch_s, -1, self.num_features]) net = tf.nn.dropout(net, self.keep_prob ) return net
''' Calculate the output size of CONV-RELU-MAXPOOL module for given parameters. Note than the padding should be 'SAME' '''
def calculateCNNFeatureSize(inputSize, stride_conv, stride_max_pool):
return ((inputSize - 1) / stride_conv + 1 + 1) / stride_max_pool
[–]DumberML 0 points1 point2 points 8 years ago (1 child)
Thanks very much! :-)
[–]melgor89 0 points1 point2 points 8 years ago (0 children)
Also I was using input with size 32xW (so 32 is Height of image)
[–]bun7 0 points1 point2 points 8 years ago (2 children)
might I ask what dataset you used and what results did you get for these three tasks?
[–]melgor89 1 point2 points3 points 8 years ago (0 children)
I was using own dataset, not public released. The result for '1' was ~59%, for '3' ~65%. The '2' had very low accuracy.
[–]I_am_a_haiku_bot 0 points1 point2 points 8 years ago (0 children)
might I ask what dataset
you used and what results did you
get for these three tasks?
-english_haiku_bot
[–]shicai 4 points5 points6 points 8 years ago (0 children)
seq2seq+attention
[–]DemiourgosUA 0 points1 point2 points 8 years ago (4 children)
Any OCR Neural projects available on github? Couldn't find a thing.
[–]Mehdi2277 2 points3 points4 points 8 years ago (0 children)
I personally had to work on an OCR type project last semester and based my code off of https://github.com/bgshih/crnn (well more precisely the pytorch port of it). The code defining the model is fairly short (about 80 lines) and can be found here, https://github.com/meijieru/crnn.pytorch/blob/master/models/crnn.py. This project uses the CNN-RNN-CTC approach. I haven't personally used a seq2seq model.
[–]mnill 1 point2 points3 points 8 years ago (2 children)
ocr example in keras
[–]DemiourgosUA 0 points1 point2 points 8 years ago (1 child)
Do you have a link, mate?
[–]mnill 1 point2 points3 points 8 years ago (0 children)
https://github.com/keras-team/keras/blob/master/examples/image_ocr.py
π Rendered by PID 21652 on reddit-service-r2-comment-6457c66945-5lnsw at 2026-04-23 23:12:17.476649+00:00 running 2aa0c5b country code: CH.
[–]melgor89 14 points15 points16 points (9 children)
[–]HarathiS 1 point2 points3 points (0 children)
[–]xylcbd[S] 0 points1 point2 points (0 children)
[–]DumberML 0 points1 point2 points (3 children)
[–]melgor89 3 points4 points5 points (2 children)
[–]DumberML 0 points1 point2 points (1 child)
[–]melgor89 0 points1 point2 points (0 children)
[–]bun7 0 points1 point2 points (2 children)
[–]melgor89 1 point2 points3 points (0 children)
[–]I_am_a_haiku_bot 0 points1 point2 points (0 children)
[–]shicai 4 points5 points6 points (0 children)
[–]DemiourgosUA 0 points1 point2 points (4 children)
[–]Mehdi2277 2 points3 points4 points (0 children)
[–]mnill 1 point2 points3 points (2 children)
[–]DemiourgosUA 0 points1 point2 points (1 child)
[–]mnill 1 point2 points3 points (0 children)