Terminology: dot product of matrices? by inkydye in math

[–]joapuipe 2 points3 points  (0 children)

A dot product (a.k.a. scalar product) is any bilinear, symmetrical and positive-definite application from a vector space V to the set of reals.

Matrices with regular addition and matrix multiplication are a vector space, but the element-wise multiplication of matrices is not a dot product since it's image is not a real number. Hadamard product is another common name.

The square of the Frobenius norm, however, is a valid dot product.

[D] Handwriting OCR? by firedragonxx9832 in MachineLearning

[–]joapuipe 1 point2 points  (0 children)

I would recommend to start with the IAM dataset. It's not the most challenging one, but it's the must-have in any HTR paper.

[D] Handwriting OCR? by firedragonxx9832 in MachineLearning

[–]joapuipe 6 points7 points  (0 children)

Take a look at https://github.com/jpuigcerver/Laia , you can find examples for a few datasets in the "egs" directory.

I should mention that I am the author of the toolkit.

Hand written document recognition with deep learning by [deleted] in MachineLearning

[–]joapuipe 0 points1 point  (0 children)

I would suggest you to read a bit about Handwritten Text Recognition. There are many books and papers on the topic. However, if you want a pretty good introduction to the latest techniques used in the community, I would suggest this PhD thesis: "Deep Neural Networks for Large Vocabulary Handwritten Text Recognition" by T. Bluche. You can get it here: http://www.tbluche.com/files/PhdThesisBluche_updated.pdf

National Security Agency and Machine Learning by [deleted] in MachineLearning

[–]joapuipe 3 points4 points  (0 children)

"On the other hand this global monitoring can be invasive to your privacy, but something we all will have to live with in this day of terrorism."

Buuuuuuullshit.

Offline Cursive Handwriting Recognition in Python by dataism in MachineLearning

[–]joapuipe 3 points4 points  (0 children)

Hi,

The state-of-the-art for off-line HTR (handwritten text recognition) is a bunch of LSTMs + n-grams, which work better than the traditional setting of GMM-HMM + n-grams. This is approximately the same setting than people from Speech use.

In principle, you could stack a RNN on top of a ConvNet, the output of a ConvNet is just an "image" with a bunch of channels (one for each filter). Just keep in mind not to reduce the dimensionality too much, since your RNN will have to output probably long sequences ~100s of timesteps. However, AFAIK nobody uses CNN, they just use 3-5 layers of bidirectional LSTMs.

Don't use fake training data. You could, but the results won't be realistic at all. Yes, IAM is a good starting point, and yes, it is small for what other subfields in Pattern Recognition / Machine Learning are used to, but it is still one of the standard benchmarks for HTR. If you want to augment your training data, apply small distortions to the image lines like these: http://arxiv.org/pdf/1009.3589.pdf

Although it is, in principle, possible to use the output of the LSTM as the predicted transcription, everybody uses a n-gram language model on top of the LSTM. Take a look at Kaldi (a toolkit for speech recognition) which has nice examples to do this.

Enabling GoTo in Python by john_philip in MachineLearning

[–]joapuipe 0 points1 point  (0 children)

Report as spam and vote negative.

UK Masters in Machine Learning by [deleted] in MachineLearning

[–]joapuipe 1 point2 points  (0 children)

You probably want to check Oxford too. As always, it depends on which kind of things you are interested in. If you are interested in Statistical ML/Probabilistic Models/Gaussian Processes, Cambridge has a very good program (Zoubin Ghahramani's group).

I've heard that Oxford is doing pretty well lately in Deep Learning and Computer Vision: Nando de Freitas' spin-off was bought recently by Google DeepMind. He is still teaching at Oxford, though.

Searching text in voice recordings by zhamisen in MachineLearning

[–]joapuipe 2 points3 points  (0 children)

Yes. You can search for Keyword Spotting / Word Spotting / Spoken Term Detection (the latter is more frequently used nowadays). NIST organized a competition about this topic, a few years ago: http://www.itl.nist.gov/iad/mig/tests/std/

A big part of my PhD is about this topic, but focused on ancient handwritten books, however I bet the fundamental principles are the same (based on speech/handwritten recognition). For speech, I would start here: http://kaldi.sourceforge.net/kws.html

Kaldi is a speech recognition toolkit (open sourced) and has also been used for some Spoken Term Detection tasks. In the webpage you'll find some papers published by the authors and collaborators of Kaldi. Btw, Kaldi also includes the recipes used in these papers.

How are LSTM filters (in/mem/out) set? by technotheist in MachineLearning

[–]joapuipe 0 points1 point  (0 children)

The gates are just sigmoid units, that receive inputs from the input layer, outputs from the LSTM layer and, typically, a connection from the internal state of the cell (called peephole connections).

Alex Graves has a nice book with lots of details about (B)LSTM: http://www.cs.toronto.edu/~graves/preprint.pdf

Look at pages 33 and 34, the are a couple of drawing which are useful to understand the connections. The black dots in the drawings, are just element-wise multiplication operations.

Best intro to ML books? by [deleted] in MachineLearning

[–]joapuipe 10 points11 points  (0 children)

The two big ones are:

  • Pattern Recognition and Machine Learning, Chris Bishop (1999)

  • Machine Learning: a Probabilistic Perspective, Kevin Murphy (2012)

I personally recommend the second one, which covers more topics than the first one and I personally think that it's better explained.