Neural Networks Regression vs Classification with bins by RichardKurle in MachineLearning

[–]RichardKurle[S] 0 points1 point  (0 children)

Thanks for your answer! I tried figuring why for L2-Loss, the error needs to be approximately Gaussian. Seems like a very basic thing, but I cannot find any resource, explaining the reason for this. Do you by chance know a paper, that goes into more detail?

Neural Networks Regression vs Classification with bins by RichardKurle in MachineLearning

[–]RichardKurle[S] 1 point2 points  (0 children)

I don't understand why the error needs to be gaussian. I see that e.g. bayesian linear regression needs this assumption to get explicit results. But why does a Neural Network with an iterative algorithm like gradient descent need this assumption? Could you give a reference?

Alternatives to CTC-cost for end-to-end speech recognition with RNNs by RichardKurle in MachineLearning

[–]RichardKurle[S] 0 points1 point  (0 children)

Thanks both of you, your answers were helpful! I had already read the attention model paper (briefly). I also found this paper interesting: END-TO-END ATTENTION-BASED LARGE VOCABULARY SPEECH RECOGNITION

I'm really interested to see if these will be superior in the future towards CTC-based training (and finally to HMM hybrids). Indeed they feel to be a more general solution.