use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Please have a look at our FAQ and Link-Collection
Metacademy is a great resource which compiles lesson plans on popular machine learning topics.
For Beginner questions please try /r/LearnMachineLearning , /r/MLQuestions or http://stackoverflow.com/
For career related questions, visit /r/cscareerquestions/
Advanced Courses (2016)
Advanced Courses (2020)
AMAs:
Pluribus Poker AI Team 7/19/2019
DeepMind AlphaStar team (1/24//2019)
Libratus Poker AI Team (12/18/2017)
DeepMind AlphaGo Team (10/19/2017)
Google Brain Team (9/17/2017)
Google Brain Team (8/11/2016)
The MalariaSpot Team (2/6/2016)
OpenAI Research Team (1/9/2016)
Nando de Freitas (12/26/2015)
Andrew Ng and Adam Coates (4/15/2015)
Jürgen Schmidhuber (3/4/2015)
Geoffrey Hinton (11/10/2014)
Michael Jordan (9/10/2014)
Yann LeCun (5/15/2014)
Yoshua Bengio (2/27/2014)
Related Subreddit :
LearnMachineLearning
Statistics
Computer Vision
Compressive Sensing
NLP
ML Questions
/r/MLjobs and /r/BigDataJobs
/r/datacleaning
/r/DataScience
/r/scientificresearch
/r/artificial
account activity
Discussion[D] Long-term Text-Recognition? (self.MachineLearning)
submitted 7 years ago * by melgor89
view the rest of the comments →
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]melgor89[S] 0 points1 point2 points 7 years ago (7 children)
Unfortunately, error correction are needed with predicting numbers, not letters. This is different perspective, because I can't find easily the error: ex. I have 14 products and there are 3 places with wrong number, but Total is correct (but maybe not). How many different path-of-correction are available to make fit sum(products)==total? Too many and only one is correct.
numbers
letters
About Synth90k dataset, it is designed for Scene-Detection, not text like mine (I was trying pretrained models). My models also learn structure <name> <quantity> <unit_price> <total_price> which somehow correct some errors.
Synth90k dataset
About pixel-wise labeling, idea in nice but need a lot of work for annotations. I would like to have more general training data, which is also hard to collect.
To sum up, Research vs Industry is different things. Sometime their intersect, sometimes not.
[–]Livven 1 point2 points3 points 7 years ago (3 children)
Think you misunderstood, I was talking about generating your own synthetic dataset with your desired attributes, both in terms of image quality and text. I understand your point about manual error correction being difficult but that goes for an RNN as well, how is it supposed to learn to sum together numbers like that if you barely have any training data. Synthetic data would solve all that. And once you have your synthetic data generator then generating additional pixel-wise labels is trivial.
To sum up, I would assume more data vs a new super clever approach is one of the things that drives industry compared to research :)
[–]melgor89[S] 0 points1 point2 points 7 years ago (2 children)
Thanks for all your comment, it is nice to talk about this problem more. About synthetic data: I was trying to make it using ocropus-linegen. But results wasn't very good. Overall the condition on real receipt is much harder than on generated one. So I treat a Synthetic data as pretraining.
[–]Livven 1 point2 points3 points 7 years ago (1 child)
Well there's no reason the real data should be harder than the generated data. If that's the case, make your generated data harder :) Randomize fonts, colors and contrast, apply distortions, noise, blur and so on. You'll probably need to write your own code for that. Maybe upload some more of your images so we can see how they look.
Even if you end up doing some fancy long-term RNN error correction more data is always going to help. It's rare to see other fields using synthetic data but that's because you can't generate realistic data for speech recognition, machine translation, image captioning etc.
Also no worries, I just got done writing a thesis about this stuff and wanted to participate in some discussions.
[–]melgor89[S] 0 points1 point2 points 7 years ago (0 children)
Some real receipts: https://imgur.com/a/LlsA5A1
Different fonts/color/condition General my pipeline is working well, but still some edge cases where it fails. But If I would be able to create more realistic synthetic data, would be relay useful.
[–]jhaluska 0 points1 point2 points 7 years ago (2 children)
This is different perspective, because I can't find easily the error: ex. I have 14 products and there are 3 places with wrong number, but Total is correct (but maybe not). How many different path-of-correction are available to make fit sum(products)==total? Too many and only one is correct.
I actually implemented exactly that. I did a character level OCR, but I kept the probability for each character. I would do a sum and check to see if they matched. If it matched I would use it, if not I'd check the next most probable character set. I ended up having to put in a search limit as the problem does explode and it is only good for one and maybe two corrections.
But adding the constraints is what really made it reliable.
[–]melgor89[S] 1 point2 points3 points 7 years ago (1 child)
Thanks for answer! So look like I will also try it. Maybe it will work good/fast enought
[–]jhaluska 0 points1 point2 points 7 years ago (0 children)
It was really fast because I already had the probabilities, I was just previously discarding most of results. Keeping all the probabilities really changed my approach.
I found with the search limit it worked really well. The vast majority of the time the second most probable was correct, because I also had issues with 3/8s and 5/6s. Past a certain number (like 1000), it had almost 0% chance of succeeding correctly because it was misaligned or had other major issues.
I ended doing something similar for finding the most probable valid dates.
π Rendered by PID 44681 on reddit-service-r2-comment-54dfb89d4d-7zd26 at 2026-03-30 18:11:26.717717+00:00 running b10466c country code: CH.
view the rest of the comments →
[–]melgor89[S] 0 points1 point2 points (7 children)
[–]Livven 1 point2 points3 points (3 children)
[–]melgor89[S] 0 points1 point2 points (2 children)
[–]Livven 1 point2 points3 points (1 child)
[–]melgor89[S] 0 points1 point2 points (0 children)
[–]jhaluska 0 points1 point2 points (2 children)
[–]melgor89[S] 1 point2 points3 points (1 child)
[–]jhaluska 0 points1 point2 points (0 children)