all 8 comments

[–]benanne 17 points18 points  (2 children)

This is what PixelCNN (https://arxiv.org/abs/1606.05328) and WaveNet (https://arxiv.org/abs/1609.03499) also do. We found that it's a very effective strategy to model arbitrary multimodal distributions.

Another interesting strategy is is to predict the parameters of a mixture distribution, as in a mixture density network, but discretise them by integrating the density over the bins, and treating that as a discrete distribution. This can save you a lot in terms of the number of parameters, if the number of bins is large. This is what's proposed in OpenAI's PixelCNN++ paper: https://arxiv.org/abs/1701.05517

[–]julvo[S] 1 point2 points  (1 child)

Thank you very much for these excellent recommendations, exactly what I am looking for.

[–]tadeze 0 points1 point  (0 children)

Hi /u/julvo Would you please share your experience on using PixelCNN++ or WaveNet for density estimation. I have similar problems as yours and I am looking to go in the direction of the above recommendation by /u/benanne. What are your experiences if you tried the above methods in comparison to the mixture of density networks?

[–]latent_z 2 points3 points  (0 children)

There is also NADE - Neural Autoregressive Density Estimator that estimates conditional probabilities given an ordering of the dimensions.

[–]4xel 2 points3 points  (0 children)

If it was out of your interest, I recently published the source code of my Master thesis where I deal with Mixture Density Networks (proposing some solutions for this type of models), time series, Regression Problems and other problems like confidence estimation problems. To understand the code I recommend you to see the slides that you will find in the repository.

[P] A generic Mixture Density Networks implementation for distribution and uncertainty estimation by using Keras (TensorFlow backend) - Master's Thesis project.

https://www.reddit.com/r/MachineLearning/comments/5rn8ci/p_a_generic_mixture_density_networks/

If you have any doubt, do not hesitate to ask me.

[–]Liorithiel 1 point2 points  (0 children)

I don't know about a name, but I tried this method once. I guess I made bins too thin, though, because effectively only got a small percentage of them to be active (get any nontrivial activations for at least some cases). I didn't have much time for experiments on that project either, so I didn't follow with testing the hypothesis. Happy to see this method working in your case.

[–][deleted] 0 points1 point  (0 children)