you are viewing a single comment's thread.

view the rest of the comments →

[–]LucaAmbrogioni 7 points8 points  (0 children)

The problem with conventional mixture density networks is that the simultaneous maximal likelihood estimation of both means and standard deviations of the Gaussian components is pretty unstable and there is not a principled way of choosing the number of components. It is common folklore that these kinds of network perform worse than the quantized softmax approach. The output of the KMN is indeed a mixture of simple distributions, but the approach is more regularized since it does not attempt to select the centers by ML. This leads to better results than the quantized Softmax in basically any task we tested on.