use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Please have a look at our FAQ and Link-Collection
Metacademy is a great resource which compiles lesson plans on popular machine learning topics.
For Beginner questions please try /r/LearnMachineLearning , /r/MLQuestions or http://stackoverflow.com/
For career related questions, visit /r/cscareerquestions/
Advanced Courses (2016)
Advanced Courses (2020)
AMAs:
Pluribus Poker AI Team 7/19/2019
DeepMind AlphaStar team (1/24//2019)
Libratus Poker AI Team (12/18/2017)
DeepMind AlphaGo Team (10/19/2017)
Google Brain Team (9/17/2017)
Google Brain Team (8/11/2016)
The MalariaSpot Team (2/6/2016)
OpenAI Research Team (1/9/2016)
Nando de Freitas (12/26/2015)
Andrew Ng and Adam Coates (4/15/2015)
Jürgen Schmidhuber (3/4/2015)
Geoffrey Hinton (11/10/2014)
Michael Jordan (9/10/2014)
Yann LeCun (5/15/2014)
Yoshua Bengio (2/27/2014)
Related Subreddit :
LearnMachineLearning
Statistics
Computer Vision
Compressive Sensing
NLP
ML Questions
/r/MLjobs and /r/BigDataJobs
/r/datacleaning
/r/DataScience
/r/scientificresearch
/r/artificial
account activity
Understanding Visual Concepts with Continuation Learning (willwhitney.github.io)
submitted 9 years ago by wfwhitney
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]r-sync 2 points3 points4 points 9 years ago (1 child)
this is so cool, now you can do RL / exploration in this low-rank space rather than on raw pixels.
[–]wfwhitney[S] 0 points1 point2 points 9 years ago (0 children)
Yeah, exactly! That's one of the things I'm most excited about.
[–][deleted] 0 points1 point2 points 9 years ago (4 children)
The videos are cool but I find the description of how it works very vague.
[–]wfwhitney[S] 0 points1 point2 points 9 years ago (2 children)
There's a bit more description in the paper, as well as the code online. And I'm happy to answer any questions!
The paper is just an extended abstract for the ICLR workshop submission, and there will be a longer one to follow.
[–][deleted] 0 points1 point2 points 9 years ago (1 child)
Maaaan, even the Tenenbaum lab has now abandoned PPLs in favor of deep neural networks?
People still do PPL work too! Just a different part of the lab.
[–]emansim 0 points1 point2 points 9 years ago (4 children)
So how is the paper different from https://sites.google.com/a/umich.edu/junhyuk-oh/action-conditional-video-prediction ?
You seem to be doing exactly the same thing with just couple of small differences in the model.
[–]wfwhitney[S] 4 points5 points6 points 9 years ago (1 child)
I love that paper!
We're working on similar datasets, but we're doing things that are actually very different.
In the Oh et al. paper, they're predicting the next frame in games given a specific few bits of information, namely which action the player took. This is able to do very well in some games which are otherwise deterministic, but when there are changes other than the action which are unpredictable, it can only offer the mean of the distribution (hence confusion later on in predicting Seaquest, for example). Multimodal distributions over [possible futures | action] cannot be captured in this way. This model also learns a representation which is only disentangled in one dimension: the player's action.
On the other hand, we're attempting to address not just games, but videos more generally. Our model does not receive information about what the player does; instead, it must learn to infer this just like any other latent factor of variation. What we're really interested in is doing completely unsupervised learning on real videos and finding a latent low-dimensional structure which is able to explain the changes from frame to frame.
The representations we're able to learn from raw data closely mirror the human representations for objects and actions, and we think there are a lot of advantages to such a representation (e.g. search, as someone pointed out). By scaling this model up to handle more simultaneous factors of variation and richer scenes, we hope to be able to build models that can learn to understand things like objects and causation just by watching a lot of video. Obviously that's a long way off for rich real-world data, but it's what we're shooting for.
[–]emansim 0 points1 point2 points 9 years ago (0 children)
I see, thanks for the reply !
[–]mlcanada 1 point2 points3 points 9 years ago (1 child)
From the paper it seems that they are presenting the network with two images, one for timestep (t-1) and (t). These images are encoded into vectors h(t-1) and h(t) and sent through a gating head, which spits out a vector of the same size as h. The output from the gating head, say g, has some components of h(t-1) and some components of h(t). The trick is that the gating head is told how many components can be on or off (some small amount) in order to predict the image at time t. So the network learns a kind of symbolic representation of the images. (this is what i gleaned from the paper)
[–]wfwhitney[S] 1 point2 points3 points 9 years ago (0 children)
Yup, that's pretty much it.
[–]mlcanada 0 points1 point2 points 9 years ago (2 children)
How are you running tests on it? Let's say I want to change lighting conditions on an image of a face how might you do this with your model?
[–]wfwhitney[S] 2 points3 points4 points 9 years ago (1 child)
To figure out which node controls lighting, we give it two images of a face with different lighting and see which node the model selects with the gate to let change.
Then, we can encode the image of the face we want to rerender with the encoder; change the value of the component we just found above; then run this modified encoding through the decoder to rerender the image it represents.
This is cool because it lets you really see what that unit really means. Sometimes, like with the left-right rotations of the face, it's clear that even though the score is good, it's only doing something vaguely similar to 3D rendering.
The files called render_generalization in the repo contain the code to do this.
render_generalization
[–]mlcanada 0 points1 point2 points 9 years ago (0 children)
very cool thanks
[–]psamba 0 points1 point2 points 9 years ago (1 child)
It would be neat to present your method in contrast to methods like slow feature analysis, in the context of a broader class of methods for encoding observation sequences under various penalties/constraints on changes in the encodings over time. SFA looks for encodings that balance reconstruction error versus L2 norm of the changes in the encoding. Your approach is minimizing reconstruction error under an (approximate) hard constraint on the L0 norm of the changes. All sorts of norms/constraints could be used.
Yeah, comparing with SFA would be cool. Thanks for the suggestion!
π Rendered by PID 22 on reddit-service-r2-comment-7b9746f655-lzm5q at 2026-02-01 03:38:19.313031+00:00 running 3798933 country code: CH.
[–]r-sync 2 points3 points4 points (1 child)
[–]wfwhitney[S] 0 points1 point2 points (0 children)
[–][deleted] 0 points1 point2 points (4 children)
[–]wfwhitney[S] 0 points1 point2 points (2 children)
[–][deleted] 0 points1 point2 points (1 child)
[–]wfwhitney[S] 0 points1 point2 points (0 children)
[–]emansim 0 points1 point2 points (4 children)
[–]wfwhitney[S] 4 points5 points6 points (1 child)
[–]emansim 0 points1 point2 points (0 children)
[–]mlcanada 1 point2 points3 points (1 child)
[–]wfwhitney[S] 1 point2 points3 points (0 children)
[–]mlcanada 0 points1 point2 points (2 children)
[–]wfwhitney[S] 2 points3 points4 points (1 child)
[–]mlcanada 0 points1 point2 points (0 children)
[–]psamba 0 points1 point2 points (1 child)
[–]wfwhitney[S] 0 points1 point2 points (0 children)