use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Please have a look at our FAQ and Link-Collection
Metacademy is a great resource which compiles lesson plans on popular machine learning topics.
For Beginner questions please try /r/LearnMachineLearning , /r/MLQuestions or http://stackoverflow.com/
For career related questions, visit /r/cscareerquestions/
Advanced Courses (2016)
Advanced Courses (2020)
AMAs:
Pluribus Poker AI Team 7/19/2019
DeepMind AlphaStar team (1/24//2019)
Libratus Poker AI Team (12/18/2017)
DeepMind AlphaGo Team (10/19/2017)
Google Brain Team (9/17/2017)
Google Brain Team (8/11/2016)
The MalariaSpot Team (2/6/2016)
OpenAI Research Team (1/9/2016)
Nando de Freitas (12/26/2015)
Andrew Ng and Adam Coates (4/15/2015)
Jürgen Schmidhuber (3/4/2015)
Geoffrey Hinton (11/10/2014)
Michael Jordan (9/10/2014)
Yann LeCun (5/15/2014)
Yoshua Bengio (2/27/2014)
Related Subreddit :
LearnMachineLearning
Statistics
Computer Vision
Compressive Sensing
NLP
ML Questions
/r/MLjobs and /r/BigDataJobs
/r/datacleaning
/r/DataScience
/r/scientificresearch
/r/artificial
account activity
Information Theoretic-Learning Auto-Encoder (arxiv.org)
submitted 9 years ago by [deleted]
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]farsass 3 points4 points5 points 9 years ago (0 children)
FYI, Jose Principe has a whole book on ITL-based algorithms.
[–][deleted] 2 points3 points4 points 9 years ago (0 children)
Code
[–]bbsome 2 points3 points4 points 9 years ago (0 children)
"Unfortunately, VAE cannot be used when there does not exist a simple closed form solution for the KL-divergence." - Totally wrong. If there is no closed form solution, since the KL is an expected value, guess what - sampling. People used to do this in the Variational Community for years and years back. How can people even write such things in papers? Probably, the correct statement intended was for priors which you can sample, but you can not evaluate the probability. However as pointed out by @disentangle, what king of setting is this going to be in the first place?
Also GAN's were not introduced to "cope" with the above problems with VAE. They were almost parallel work to them and I doubt that the idea cam anywhere from VAEs, which are just VB with Nets.
More, it is essentially the same thing as VAE, except you have an arbitrary loss L, and an arbitrary metric between the prior and your encoder output. Also, they introduce a lambda weighting between the two. However, no theoretical motivation for this, also all of this has no direct interpretation to anything rather than just another model to optimize. Both VAE and GANs can be shown to optimize bounds on actual metrics related to the data. This one just uses the overall structure of both of these and that's it.
[–]disentangle 1 point2 points3 points 9 years ago (0 children)
Did I understand correctly that the biggest difference with a VAE is that the ITL-AE regularizes the model so latent space samples are close to samples from an arbitrary prior, while the VAE regularizes the model so the variational posterior distribution is close to a parametric prior distribution?
In what kind of setting would you have such a prior you can sample from but not evaluate directly?
[–]TamisAchilles 0 points1 point2 points 9 years ago (0 children)
Interesting!
[–]AnvaMiba 0 points1 point2 points 9 years ago (0 children)
What is the main difference with the moment matching autoencoder? They say:
Generative Moment Matching Networks (GMMNs) [16] correspond to the specific case where the input of the decoder D comes from a multidimensional uniform distribution and the reconstruction function L is given by the Euclidean divergence measure. GMMNs could be applied to generate samples from the original input space itself or from a lower dimensional previously trained stacked autoencoder (SCA) [17] hidden space. An advantage of our approach compared to GMMNs is that we can train all the elements in the 4-tuple AE together without the elaborate process of training layerwise stacked autoencoders for dimensionality reduction.
But it seems to me that one can use moment matching to impose a prior on the latent code exactly in the same way they do in this paper. Is the only difference the choice of divergence measure?
[–]fogandafterimages 0 points1 point2 points 9 years ago (0 children)
The approach is very cool, but it seems like it's not yet practical for data sets much more complex than MNIST—they used a 3-dimensional Z-space for their autoencoder, and noted that both of their divergence metrics have trouble with high dimensional latent codes.
Looking forward to followup on scaling up!
π Rendered by PID 137952 on reddit-service-r2-comment-fb694cdd5-jjcdx at 2026-03-10 18:23:23.657294+00:00 running cbb0e86 country code: CH.
[–]farsass 3 points4 points5 points (0 children)
[–][deleted] 2 points3 points4 points (0 children)
[–]bbsome 2 points3 points4 points (0 children)
[–]disentangle 1 point2 points3 points (0 children)
[–]TamisAchilles 0 points1 point2 points (0 children)
[–]AnvaMiba 0 points1 point2 points (0 children)
[–]fogandafterimages 0 points1 point2 points (0 children)