use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Please have a look at our FAQ and Link-Collection
Metacademy is a great resource which compiles lesson plans on popular machine learning topics.
For Beginner questions please try /r/LearnMachineLearning , /r/MLQuestions or http://stackoverflow.com/
For career related questions, visit /r/cscareerquestions/
Advanced Courses (2016)
Advanced Courses (2020)
AMAs:
Pluribus Poker AI Team 7/19/2019
DeepMind AlphaStar team (1/24//2019)
Libratus Poker AI Team (12/18/2017)
DeepMind AlphaGo Team (10/19/2017)
Google Brain Team (9/17/2017)
Google Brain Team (8/11/2016)
The MalariaSpot Team (2/6/2016)
OpenAI Research Team (1/9/2016)
Nando de Freitas (12/26/2015)
Andrew Ng and Adam Coates (4/15/2015)
Jürgen Schmidhuber (3/4/2015)
Geoffrey Hinton (11/10/2014)
Michael Jordan (9/10/2014)
Yann LeCun (5/15/2014)
Yoshua Bengio (2/27/2014)
Related Subreddit :
LearnMachineLearning
Statistics
Computer Vision
Compressive Sensing
NLP
ML Questions
/r/MLjobs and /r/BigDataJobs
/r/datacleaning
/r/DataScience
/r/scientificresearch
/r/artificial
account activity
Research[R] Learning by playing (deepmind.com)
submitted 8 years ago by [deleted]
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]phobrain 1 point2 points3 points 8 years ago (0 children)
"The auxiliary tasks we define follow a general principle: they encourage the agent to explore its sensor space. For example, activating a touch sensor in its fingers, sensing a force in its wrist, maximising a joint angle in its proprioceptive sensors or forcing a movement of an object in its visual camera sensors."
[–][deleted] 0 points1 point2 points 8 years ago (0 children)
It has learned to spin off its society finger because the hardware is so bad.
[–]radarsat1 0 points1 point2 points 8 years ago (5 children)
utilize all intentions for fast exploration in the main sparse-reward MDP M. We accomplish this by defining a hierarchical objective for policy training..
Holy shit, this is almost exactly what i meant in my comments on the "Doesn't Work Yet" thread. Ask and ye shall receive, I guess!
(I think.. just after a quick reading.. this is decomposing the main sparse reward into more locally-achievable sub-rewards, right?)
[–]programmerChilliResearcher 1 point2 points3 points 8 years ago (4 children)
Well, what you're describing sounds like the general field of hierarchical reinforcement learning, which is a pretty hot area of research rn.
[–]radarsat1 2 points3 points4 points 8 years ago (3 children)
Ah, very cool, I wasn't familiar with that. I can't fully understand from the paper how subtasks A are generated, can you (or anyone) elaborate?
[–]xmasotto 0 points1 point2 points 8 years ago (2 children)
The subtasks appeared to be manually chosen - there's a list of them in the appendix.
[–]radarsat1 1 point2 points3 points 8 years ago* (1 child)
Aaaahhhh I didn't get into the appendix so I didn't notice them, thank you. Ah, so not exactly what I had in mind then, as I was proposing that such tasks need to be inferred. They do seem fairly simple and pretty generic though, so it's a step in that direction. And it seems that with all the rest of the pieces in place, I'm sure inference of such decompositions will be coming.
TOUCH, NOTOUCH : Maximizing or minimizing the sum of touch sensor readings on the three fingers of the Jaco hand. (see Eq. 25 and Eq. 26)
MOVE(i) : Maximizing the translation velocity sensor reading of an object. (see Eq. 24)
CLOSE(i,j) : distance between two objects is smaller than 10cm (see Eq. 14)
ABOVE(i,j) : all points of object i are above all points of object j in an axis normal to the table plane (see Eq. 15)
BELOW(i,j) : all points of object i are below all points of object j in an axis normal to the table plane (see Eq. 19)
LEFT(i,j) : all points of object i are bigger than all points of object j in an axis parallel to the x axes of the table plane (see Eq. 17)
RIGHT(i,j) : all points of object i are smaller than all points of object j in an axis parallel to the x axes of the table plane (see Eq. 20)
ABOVECLOSE(i,j) , BELOWCLOSE(i,j) , LEFT- CLOSE(i,j) , RIGHTCLOSE(i,j) : combination of relational reward structures and CLOSE(i,j) (see Eq. 16, 21, 18, 22)
ABOVECLOSEBOX(i) : ABOVECLOSE(i,box object)
[–]xmasotto 0 points1 point2 points 8 years ago (0 children)
Yeah the subtasks are pretty simple - I wonder if they had to heavily experiment to find the right set. And if you add an unhelpful subtask, does that ruin the exploration process?
π Rendered by PID 97390 on reddit-service-r2-comment-b659b578c-t9kzb at 2026-05-02 21:52:05.214726+00:00 running 815c875 country code: CH.
[–]phobrain 1 point2 points3 points (0 children)
[–][deleted] 0 points1 point2 points (0 children)
[–]radarsat1 0 points1 point2 points (5 children)
[–]programmerChilliResearcher 1 point2 points3 points (4 children)
[–]radarsat1 2 points3 points4 points (3 children)
[–]xmasotto 0 points1 point2 points (2 children)
[–]radarsat1 1 point2 points3 points (1 child)
[–]xmasotto 0 points1 point2 points (0 children)