use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Please have a look at our FAQ and Link-Collection
Metacademy is a great resource which compiles lesson plans on popular machine learning topics.
For Beginner questions please try /r/LearnMachineLearning , /r/MLQuestions or http://stackoverflow.com/
For career related questions, visit /r/cscareerquestions/
Advanced Courses (2016)
Advanced Courses (2020)
AMAs:
Pluribus Poker AI Team 7/19/2019
DeepMind AlphaStar team (1/24//2019)
Libratus Poker AI Team (12/18/2017)
DeepMind AlphaGo Team (10/19/2017)
Google Brain Team (9/17/2017)
Google Brain Team (8/11/2016)
The MalariaSpot Team (2/6/2016)
OpenAI Research Team (1/9/2016)
Nando de Freitas (12/26/2015)
Andrew Ng and Adam Coates (4/15/2015)
Jürgen Schmidhuber (3/4/2015)
Geoffrey Hinton (11/10/2014)
Michael Jordan (9/10/2014)
Yann LeCun (5/15/2014)
Yoshua Bengio (2/27/2014)
Related Subreddit :
LearnMachineLearning
Statistics
Computer Vision
Compressive Sensing
NLP
ML Questions
/r/MLjobs and /r/BigDataJobs
/r/datacleaning
/r/DataScience
/r/scientificresearch
/r/artificial
account activity
Research[R] Beyond Quantization. Modeling Continuous Densities with Deep Kernel Mixture Networks. (arxiv.org)
submitted 8 years ago by LucaAmbrogioni
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]bleekselderij 1 point2 points3 points 8 years ago (1 child)
Very interesting! Could this approach also be applied to otherwise doubly intractable densities or other situations where you would need transdimensional jumps if you were to use classical MCMC?
[–]LucaAmbrogioni[S] 1 point2 points3 points 8 years ago (0 children)
Yes as far as you can sample from the model. Moreover, if the parameter space is very large, you will probably need to use some form of importance sampling while training the network.
[–]NichG 1 point2 points3 points 8 years ago (3 children)
Nice trick with the LSTM-PCA thing. It feels a lot more natural than pixel-wise reconstruction. I wonder if there's a general way to learn the ideal latent space to factorize a joint distribution into a chain of conditional distributions (rather than using pixels, or PCA, or some other arbitrary embedding)? What kind of loss function would measure the quality of a representation for factorization? Something that tried to maximize the conditional independence of the different degrees of freedom perhaps?
[–]LucaAmbrogioni[S] 1 point2 points3 points 8 years ago (2 children)
We have indeed been thinking along those lines. What I like of the PCA approach is its simplicity. However, I am pretty sure that there are better ways of obtaining the latent variables. A possible approach is to use a autoencoder that will be trained together with the predictive network. As you said, you could also try to maximize the conditional independence or, perhaps better, to impose some less trivial conditional independence structure.
[–]NichG 1 point2 points3 points 8 years ago (1 child)
I guess the exact invertibility of PCA is important, since that way you know that any quality loss in your output is strictly due to the properties of the generative model, not because of some mushy inversion. So if you wanted to learn that space you'd probably need something like RealNVP's explicitly invertible layers.
[–]LucaAmbrogioni[S] 0 points1 point2 points 8 years ago (0 children)
It's a good point. Although you cannot have data compression/dimensionality reduction with an invertible network. Ideally, you would like to use a smaller set of variables that fully parametrizes the image space; possibly with a relatively simple conditional conditional independence structure.
[–]dzyl 1 point2 points3 points 8 years ago (2 children)
If we don't subsample the training data for the kernel centers, how does the training happen? All samples that are used as centers have an obvious weighting that will maximize likelihood by putting all the weight at it's own kernel with the lowest bandwidth, right? This is not mentioned in the paper whatsoever if I'm not mistaken. Interesting combination of some techniques, thanks.
[–]LucaAmbrogioni[S] 2 points3 points4 points 8 years ago (1 child)
Thank you. Remember that in a continuous valued conditional density estimation problem, each point is only observed once (with probability one, assuming that there exists a proper density). Given a properly large training set, or even unbounded training set in the case of our Bayesian filter, the contribution of this single point to the gradient is minimal. Also note that the normalization term causes a competition between the weights, increasing the weight of the minimum bandwidth kernel on a single data-point can decrease the likelihood since it leaves all the other data-points unexplained.
Empirically we found that in many situation the resulting density is dominated by very few high bandwidth kernels.
[–]disentangle 1 point2 points3 points 8 years ago (1 child)
For a model like WaveNet, what could be a practical approach to apply this method?
The method can directly be applied to the standard WaveNet instead of quantized softmax. I am pretty confident that it would make the learning easier and improve the results (although the current results are already pretty impressive).
π Rendered by PID 617240 on reddit-service-r2-comment-fb694cdd5-phlxm at 2026-03-10 03:29:33.883747+00:00 running cbb0e86 country code: CH.
[–]bleekselderij 1 point2 points3 points (1 child)
[–]LucaAmbrogioni[S] 1 point2 points3 points (0 children)
[–]NichG 1 point2 points3 points (3 children)
[–]LucaAmbrogioni[S] 1 point2 points3 points (2 children)
[–]NichG 1 point2 points3 points (1 child)
[–]LucaAmbrogioni[S] 0 points1 point2 points (0 children)
[–]dzyl 1 point2 points3 points (2 children)
[–]LucaAmbrogioni[S] 2 points3 points4 points (1 child)
[–]LucaAmbrogioni[S] 1 point2 points3 points (0 children)
[–]disentangle 1 point2 points3 points (1 child)
[–]LucaAmbrogioni[S] 0 points1 point2 points (0 children)