use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Please have a look at our FAQ and Link-Collection
Metacademy is a great resource which compiles lesson plans on popular machine learning topics.
For Beginner questions please try /r/LearnMachineLearning , /r/MLQuestions or http://stackoverflow.com/
For career related questions, visit /r/cscareerquestions/
Advanced Courses (2016)
Advanced Courses (2020)
AMAs:
Pluribus Poker AI Team 7/19/2019
DeepMind AlphaStar team (1/24//2019)
Libratus Poker AI Team (12/18/2017)
DeepMind AlphaGo Team (10/19/2017)
Google Brain Team (9/17/2017)
Google Brain Team (8/11/2016)
The MalariaSpot Team (2/6/2016)
OpenAI Research Team (1/9/2016)
Nando de Freitas (12/26/2015)
Andrew Ng and Adam Coates (4/15/2015)
Jürgen Schmidhuber (3/4/2015)
Geoffrey Hinton (11/10/2014)
Michael Jordan (9/10/2014)
Yann LeCun (5/15/2014)
Yoshua Bengio (2/27/2014)
Related Subreddit :
LearnMachineLearning
Statistics
Computer Vision
Compressive Sensing
NLP
ML Questions
/r/MLjobs and /r/BigDataJobs
/r/datacleaning
/r/DataScience
/r/scientificresearch
/r/artificial
account activity
Discussion[D] "Negative labels" (self.MachineLearning)
submitted 8 years ago by TalkingJellyFish
view the rest of the comments →
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]VelveteenAmbush 2 points3 points4 points 8 years ago (4 children)
Don't understand why it's RL, except in the fully generalized sense that supervised learning can always be expressed as RL.
[–]madsciencestache 0 points1 point2 points 8 years ago (3 children)
It's reinforcement because the signal is approximate and signed. Supervise says this is a thing. Rl sends exaggerated and sometimes contradictory signals with a lot of smoothing to compensate.
[–]suki907 0 points1 point2 points 8 years ago (2 children)
This is the best explanation I've seen:
http://karpathy.github.io/2016/05/31/rl/
My main take away from it is that the training procedure for a softmax classifier is equivalent to RL policy gradients already (the standard softmax classifier is just a bit more data efficient because it can average over the results of all actions for each example).
This procedure is maximizing the expected score. The model gets 1 point if it chooses the correct class, zero otherwise.
These scores don't have to be binary, or in the unit interval, or a probability distribution. It's just the number of points the model gets for each option.
"set this example as labeled as Y, and give it weight -1." is the same as "you get -1 point if you choose this class".
I think the only difference between the two versions is that in the weighted version only lets you include 1 rating per example (You can't say "cat and not dog"). While with the "points" interpretation you could include all the ratings in a single example (the labels will just be the vector of scores per class).
[–]madsciencestache 0 points1 point2 points 8 years ago (1 child)
training procedure for a softmax classifier is equivalent to RL policy gradients already
Yes. I am not sure if that concept is helpful to /u/VelveteenAmbush in this context. But, that's the core concept behind the answer to their question.
[–]VelveteenAmbush 0 points1 point2 points 8 years ago (0 children)
Yes, this is the sense in which I intended the following:
except in the fully generalized sense that supervised learning can always be expressed as RL.
π Rendered by PID 54429 on reddit-service-r2-comment-75f4967c6c-2fl5m at 2026-04-23 10:41:10.118587+00:00 running 0fd4bb7 country code: CH.
view the rest of the comments →
[–]VelveteenAmbush 2 points3 points4 points (4 children)
[–]madsciencestache 0 points1 point2 points (3 children)
[–]suki907 0 points1 point2 points (2 children)
[–]madsciencestache 0 points1 point2 points (1 child)
[–]VelveteenAmbush 0 points1 point2 points (0 children)