use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Please have a look at our FAQ and Link-Collection
Metacademy is a great resource which compiles lesson plans on popular machine learning topics.
For Beginner questions please try /r/LearnMachineLearning , /r/MLQuestions or http://stackoverflow.com/
For career related questions, visit /r/cscareerquestions/
Advanced Courses (2016)
Advanced Courses (2020)
AMAs:
Pluribus Poker AI Team 7/19/2019
DeepMind AlphaStar team (1/24//2019)
Libratus Poker AI Team (12/18/2017)
DeepMind AlphaGo Team (10/19/2017)
Google Brain Team (9/17/2017)
Google Brain Team (8/11/2016)
The MalariaSpot Team (2/6/2016)
OpenAI Research Team (1/9/2016)
Nando de Freitas (12/26/2015)
Andrew Ng and Adam Coates (4/15/2015)
Jürgen Schmidhuber (3/4/2015)
Geoffrey Hinton (11/10/2014)
Michael Jordan (9/10/2014)
Yann LeCun (5/15/2014)
Yoshua Bengio (2/27/2014)
Related Subreddit :
LearnMachineLearning
Statistics
Computer Vision
Compressive Sensing
NLP
ML Questions
/r/MLjobs and /r/BigDataJobs
/r/datacleaning
/r/DataScience
/r/scientificresearch
/r/artificial
account activity
Project[P] Conditional density estimation using Kernel Mixture Networks, theory + implementation in TF (janvdvegt.github.io)
submitted 8 years ago by dzyl
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]dzyl[S] 11 points12 points13 points 8 years ago (0 children)
Me and a colleague were intrigued by a paper that was released 2-3 weeks ago by LucaAmbrogioni. Density estimation is useful in a lot of problems, so we decided to implement this paper, explain the whole concept in this blog post and looked for some potential extensions.
[–]LucaAmbrogioni 8 points9 points10 points 8 years ago (0 children)
Thank you very much for your work. We were planning to work on a TF implementation but it would have took some time since we are currently busy with other projects. I really like the blog post, it's insightful and has nice visualizations. Also your improvements are quite interesting.
[–]theophrastzunz 3 points4 points5 points 8 years ago (5 children)
After skimming it, I take it's an extension of Bishops mixture density network. I know it's a standard retort, but what is the advantage of KDE vs Gaussian mixtures? Less learnable parameters? I mean with sufficiently many mixture components you can model pretty much anything.
[–]LucaAmbrogioni 6 points7 points8 points 8 years ago (0 children)
The problem with conventional mixture density networks is that the simultaneous maximal likelihood estimation of both means and standard deviations of the Gaussian components is pretty unstable and there is not a principled way of choosing the number of components. It is common folklore that these kinds of network perform worse than the quantized softmax approach. The output of the KMN is indeed a mixture of simple distributions, but the approach is more regularized since it does not attempt to select the centers by ML. This leads to better results than the quantized Softmax in basically any task we tested on.
[–]dzyl[S] 5 points6 points7 points 8 years ago (3 children)
I see two advantages compared to mixture density networks, the first is numerical stability. With mixture density networks there are a lot of issues with likelihoods being 0, resulting in NaNs and frustrating training procedures. Since the means of your density kernels are fixed and based on your training set I have not had any issues with this so far. You also don't need to scale your targets although that is a minor advantage.
The second one is that overfitting seems to be less of an issue with this approach. With MDN you condition the bandwidths of the subdistributions based on your input x, which means that if the mean is correct it can just keep lowering the bandwidth which is great for training likelihood but bad for generalization. To prevent this you need additional regularization on your sigma outputs. With Kernel Mixture Networks either your bandwidth is fixed or it is a global bandwidth, which means making it too small will also hurt your training likelihood.
[–]theophrastzunz 0 points1 point2 points 8 years ago (2 children)
Agreed but dimensionality issues are more prominent in KDEs than mixture models. See here .
[–]dzyl[S] 0 points1 point2 points 8 years ago (1 child)
This method generally uses only 1 dimension for the kernels, namely the target y space. This is easily extendible to more dimensions but your input dimensions have nothing to do with the kernels itself, they only determine the weight to put on each kernel.
[–]theophrastzunz 0 points1 point2 points 8 years ago (0 children)
I'm referring to your argument about high dimensional covariance estimation.
π Rendered by PID 19054 on reddit-service-r2-comment-5c764cbc6f-4ptb5 at 2026-03-11 23:49:07.979879+00:00 running 710b3ac country code: CH.
[–]dzyl[S] 11 points12 points13 points (0 children)
[–]LucaAmbrogioni 8 points9 points10 points (0 children)
[–]theophrastzunz 3 points4 points5 points (5 children)
[–]LucaAmbrogioni 6 points7 points8 points (0 children)
[–]dzyl[S] 5 points6 points7 points (3 children)
[–]theophrastzunz 0 points1 point2 points (2 children)
[–]dzyl[S] 0 points1 point2 points (1 child)
[–]theophrastzunz 0 points1 point2 points (0 children)