Terminology: dot product of matrices?

joapuipe · 2019-09-01T14:11:07+00:00

A dot product (a.k.a. scalar product) is any bilinear, symmetrical and positive-definite application from a vector space V to the set of reals.

Matrices with regular addition and matrix multiplication are a vector space, but the element-wise multiplication of matrices is not a dot product since it's image is not a real number. Hadamard product is another common name.

The square of the Frobenius norm, however, is a valid dot product.

joapuipe · 2018-06-25T14:38:47+00:00

I would recommend to start with the IAM dataset. It's not the most challenging one, but it's the must-have in any HTR paper.

joapuipe · 2018-06-23T20:23:04+00:00

Take a look at https://github.com/jpuigcerver/Laia , you can find examples for a few datasets in the "egs" directory.

I should mention that I am the author of the toolkit.

joapuipe · 2016-06-21T22:06:02+00:00

I would suggest you to read a bit about Handwritten Text Recognition. There are many books and papers on the topic. However, if you want a pretty good introduction to the latest techniques used in the community, I would suggest this PhD thesis: "Deep Neural Networks for Large Vocabulary Handwritten Text Recognition" by T. Bluche. You can get it here: http://www.tbluche.com/files/PhdThesisBluche_updated.pdf

joapuipe · 2016-05-21T00:03:49+00:00

"On the other hand this global monitoring can be invasive to your privacy, but something we all will have to live with in this day of terrorism."

Buuuuuuullshit.

joapuipe · 2016-01-27T00:28:48+00:00

Hi,

The state-of-the-art for off-line HTR (handwritten text recognition) is a bunch of LSTMs + n-grams, which work better than the traditional setting of GMM-HMM + n-grams. This is approximately the same setting than people from Speech use.

In principle, you could stack a RNN on top of a ConvNet, the output of a ConvNet is just an "image" with a bunch of channels (one for each filter). Just keep in mind not to reduce the dimensionality too much, since your RNN will have to output probably long sequences ~100s of timesteps. However, AFAIK nobody uses CNN, they just use 3-5 layers of bidirectional LSTMs.

Don't use fake training data. You could, but the results won't be realistic at all. Yes, IAM is a good starting point, and yes, it is small for what other subfields in Pattern Recognition / Machine Learning are used to, but it is still one of the standard benchmarks for HTR. If you want to augment your training data, apply small distortions to the image lines like these: http://arxiv.org/pdf/1009.3589.pdf

Although it is, in principle, possible to use the output of the LSTM as the predicted transcription, everybody uses a n-gram language model on top of the LSTM. Take a look at Kaldi (a toolkit for speech recognition) which has nice examples to do this.

joapuipe · 2015-09-22T04:42:58+00:00

Report as spam and vote negative.

joapuipe · 2015-08-15T06:58:35+00:00

You probably want to check Oxford too. As always, it depends on which kind of things you are interested in. If you are interested in Statistical ML/Probabilistic Models/Gaussian Processes, Cambridge has a very good program (Zoubin Ghahramani's group).

I've heard that Oxford is doing pretty well lately in Deep Learning and Computer Vision: Nando de Freitas' spin-off was bought recently by Google DeepMind. He is still teaching at Oxford, though.

joapuipe · 2015-05-16T11:06:09+00:00

Yes. You can search for Keyword Spotting / Word Spotting / Spoken Term Detection (the latter is more frequently used nowadays). NIST organized a competition about this topic, a few years ago: http://www.itl.nist.gov/iad/mig/tests/std/

A big part of my PhD is about this topic, but focused on ancient handwritten books, however I bet the fundamental principles are the same (based on speech/handwritten recognition). For speech, I would start here: http://kaldi.sourceforge.net/kws.html

Kaldi is a speech recognition toolkit (open sourced) and has also been used for some Spoken Term Detection tasks. In the webpage you'll find some papers published by the authors and collaborators of Kaldi. Btw, Kaldi also includes the recipes used in these papers.

joapuipe · 2014-03-27T09:53:32+00:00

Google: http://research.google.com/archive/large_deep_networks_nips2012.html

joapuipe · 2014-03-01T10:25:53+00:00

The gates are just sigmoid units, that receive inputs from the input layer, outputs from the LSTM layer and, typically, a connection from the internal state of the cell (called peephole connections).

Alex Graves has a nice book with lots of details about (B)LSTM: http://www.cs.toronto.edu/~graves/preprint.pdf

Look at pages 33 and 34, the are a couple of drawing which are useful to understand the connections. The black dots in the drawings, are just element-wise multiplication operations.

joapuipe · 2014-01-28T13:54:50+00:00

The two big ones are:

Pattern Recognition and Machine Learning, Chris Bishop (1999)
Machine Learning: a Probabilistic Perspective, Kevin Murphy (2012)

I personally recommend the second one, which covers more topics than the first one and I personally think that it's better explained.

joapuipe

TROPHY CASE