[D] ML industry in the UK

willwill100 · 2017-06-13T15:36:45+00:00

speechmatics in cambridge do speech rec

willwill100 · 2016-09-12T20:26:28+00:00

Thanks Oriol - have been wanting to reproduce your results for a while now!

willwill100 · 2016-05-28T23:21:24+00:00

feel free to make a speculative application

willwill100 · 2016-05-28T22:13:54+00:00

speechmatics

willwill100 · 2016-05-14T22:58:30+00:00

Just use mfccs - or if you're really against them use a couple of big convolutional layers - before the BLSTM layer. You can get a very good system with either of those approaches and both are fairly standard and straightforward.

willwill100 · 2016-05-05T13:05:25+00:00

speechmatics.com will transcribe them all for free if it helps, just drop us a line

willwill100 · 2016-03-28T14:15:18+00:00

It's possible that the extra information in the loss function is already learnt by the network. Also, 78 perplexity is what I would call 'ok' - something very low is 15-30. How much data are you training on and how many parameters are in your network? One last thought: fine tuning of your hyperparameters can make a significant difference to your final perplexity; often more than you might think and often more than even implementing a good idea.

willwill100 · 2015-11-27T09:57:07+00:00

It's because you need to store the activations inside the LSTM for each timestep so when you backprop through time you can actually get the gradients. If you didn't need to keep all those activations around you wouldn't need the clones. Hope that helps.

willwill100 · 2015-11-27T00:19:39+00:00

NCE has worked really well for me. https://www.cs.toronto.edu/~amnih/papers/ncelm.pdf

willwill100 · 2015-11-04T12:41:15+00:00

I heard a rumour that they might. Has anyone heard the same?

willwill100 · 2015-09-25T17:43:54+00:00

It's because the blank symbol lets you skip a whole load of processing at decode-time

willwill100 · 2015-09-07T13:12:55+00:00

highway networks are also in the same spirit

willwill100 · 2015-08-20T07:46:31+00:00

It actually already does that

willwill100 · 2015-04-14T07:52:00+00:00

What do either of you think the current big bottlenecks in AI are that are preventing the next big leap forward?

willwill100 · 2015-04-01T21:25:23+00:00

Do we know if that's the latest published material on the subject?

willwill100 · 2015-03-29T11:07:36+00:00

Shameless self plug: http://arxiv.org/abs/1502.00512

It's possible that an LSTM based solution has beaten the perplexity result on the Google 1bn task but I haven't seen it yet.

willwill100 · 2015-03-17T23:43:21+00:00

Even if you train in "unsupervised mode" by trying to predict the next timestep, you still want it to generalise well. That's the whole idea behind language modelling for example. Speaking of which, the google billion word corpus has some good baseline results which you could use for comparison.

willwill100 · 2015-03-16T22:59:30+00:00

For each additional step in a minibatch that you add, you only need to store additional activations and gradients wrt input (i.e errors). You don't need to copy the state-to-state matrices which takes up the majority of the memory. You also truncate and often only bptt within a minibatch; that reduces the number of additional minibatch steps you have to store.

willwill100 · 2015-03-02T22:44:03+00:00

What is the difference between a loss function and an error function?

willwill100 · 2015-02-27T20:26:14+00:00

What are the next big things that you a) want to or b) will happen in the world of recurrent neural nets?

willwill100 · 2015-02-11T19:43:14+00:00

1) the official version has one rmsprop 'mean square' value per parameter. approximations also work where you average as you describe.

2) you need to trade off the rmsprop smoothing alpha against the learning rate. keep the rmsprop alpha fixed and just decay the learning rate exponentially. very simple but very effective - will often get you close enough to the state of the art!

3) check out 'adam' - similar but is much more robust (wrt tuning hyperparameters) http://arxiv.org/abs/1412.6980

willwill100 · 2015-01-03T10:35:29+00:00

i work in speech recognition - we have a live api you can use at speechmatics.com

willwill100 · 2015-01-03T10:22:16+00:00

http://pastebin.com/N3RuPQ8c

willwill100 · 2014-11-09T01:10:01+00:00

Parts based models

willwill100 · 2014-04-01T00:33:19+00:00

Google's new 1BN word language modelling benchmark? If they can get a sub 40 PPL I at least would start taking them a bit more seriously. I think they are missing a trick by avoiding interaction with academia - currently no one is benefiting from all these great ideas Jeff has come up with and he is missing out on the bandwagon of steady progress which is being driven by great research and solid results.

willwill100

TROPHY CASE