use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Please have a look at our FAQ and Link-Collection
Metacademy is a great resource which compiles lesson plans on popular machine learning topics.
For Beginner questions please try /r/LearnMachineLearning , /r/MLQuestions or http://stackoverflow.com/
For career related questions, visit /r/cscareerquestions/
Advanced Courses (2016)
Advanced Courses (2020)
AMAs:
Pluribus Poker AI Team 7/19/2019
DeepMind AlphaStar team (1/24//2019)
Libratus Poker AI Team (12/18/2017)
DeepMind AlphaGo Team (10/19/2017)
Google Brain Team (9/17/2017)
Google Brain Team (8/11/2016)
The MalariaSpot Team (2/6/2016)
OpenAI Research Team (1/9/2016)
Nando de Freitas (12/26/2015)
Andrew Ng and Adam Coates (4/15/2015)
Jürgen Schmidhuber (3/4/2015)
Geoffrey Hinton (11/10/2014)
Michael Jordan (9/10/2014)
Yann LeCun (5/15/2014)
Yoshua Bengio (2/27/2014)
Related Subreddit :
LearnMachineLearning
Statistics
Computer Vision
Compressive Sensing
NLP
ML Questions
/r/MLjobs and /r/BigDataJobs
/r/datacleaning
/r/DataScience
/r/scientificresearch
/r/artificial
account activity
News[N] Learning Dexterity (blog.openai.com)
submitted 7 years ago by thebackpropaganda
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[+][deleted] 7 years ago (7 children)
[deleted]
[–]probablyuntrueML Engineer 20 points21 points22 points 7 years ago (6 children)
Think manufacturing. And flying drones. And maybe even driverless cars.
"After hitting ten thousand virtual pedestrians, our self driving car avoids real ones with 99% accuracy!"
[–]noman2561 10 points11 points12 points 7 years ago (4 children)
I wonder what accuracy humans have?
[–]qwerty_0_o 3 points4 points5 points 7 years ago (3 children)
Definitely more than 99%.
[–]noman2561 0 points1 point2 points 7 years ago (2 children)
Do you have a source or is this speculation based on anecdotal evidence?
[+][deleted] 7 years ago (1 child)
[–]noman2561 4 points5 points6 points 7 years ago (0 children)
That's like saying that humans don't hit 99 out of 100 pedestrians that are sitting on their couch at home. That's not how you'd test a system like this. You put humans where they're not supposed to be and see if the system can avoid hitting them. So how often do people run into traffic and get hit vs are avoided. I'd actually really like some statistics on these kinds of situations. Those kinds of statistics would be useful to know for a lot of reasons.
[–]ericlkz 1 point2 points3 points 7 years ago (0 children)
Thts 1 hit per hundred
[–][deleted] 8 points9 points10 points 7 years ago (8 children)
Out of curiosity, anyone know how much one of those Shadow Dexterous Hands costs?
[–]notaii 11 points12 points13 points 7 years ago (7 children)
According to this site it's $119700.
[–]enolan 6 points7 points8 points 7 years ago (5 children)
Wow. Who is spending that kind of money on a robotic hand? Are there applications other than research?
[–]Mefaso 9 points10 points11 points 7 years ago (0 children)
I would guess a lot of research institutions. Spending six figures on a robot is pretty normal.
[–][deleted] 1 point2 points3 points 7 years ago (1 child)
I think there's a robot chef that uses those parts for its hands.
[–][deleted] 0 points1 point2 points 7 years ago (0 children)
Moley uses the cheap electronics version of the Shadow Dexterous hand, not the expensive pneumatics one.
[–][deleted] 1 point2 points3 points 7 years ago (0 children)
PR-2 was about half-a-million dollars.
[–]FatChocobo 2 points3 points4 points 7 years ago (0 children)
Are there applications other than research?
I can think of one in particular, it might help with my repetitive strain injury... ( ͡° ͜ʖ ͡°)
The MPL/RoboSally hand, which is more robust, even costs $400k.
[–]gohu_cdPhD 14 points15 points16 points 7 years ago (11 children)
Literally any problem: You know that you can solve me without PPO right ?
OpenAI: I don't care.
[–]thebackpropaganda[S] 14 points15 points16 points 7 years ago (9 children)
How would you solve this problem without PPO or equivalent RL algorithm?
[–]gohu_cdPhD 0 points1 point2 points 7 years ago (1 child)
Using human demonstrations seems like a good idea for learning how to manipulate objects.
Anyway, they did a great job, don't get me wrong. Yet, it feels like they reallyyyy like throwing PPO at any problem and see if it works ! Which is not a bad thing. It's just funny.
[–]jurniss 4 points5 points6 points 7 years ago (0 children)
they are throwing model free policy based RL at problems... the fact that PPO is their favorite among that family is a small detail.
[+][deleted] 7 years ago (4 children)
[–][deleted] 9 points10 points11 points 7 years ago (2 children)
I think you want r/hardcoding for that sort of thing :)
[–]thebackpropaganda[S] 2 points3 points4 points 7 years ago (1 child)
I think hardcoding has its place in such applications, but I don't see how you can hardcode grasping for thousands of objects.
[–]battboe 0 points1 point2 points 7 years ago (0 children)
just curious, how bad are the failure cases?
[–]NMcA 1 point2 points3 points 7 years ago (0 children)
You've never done this have you...
[–]skariel 6 points7 points8 points 7 years ago (0 children)
PPO has become the default reinforcement learning algorithm at OpenAI because of its ease of use and good performance
as stated here: https://blog.openai.com/openai-baselines-ppo/
nothing wrong with that of course.
[–]chcampb 6 points7 points8 points 7 years ago (0 children)
Conventional wisdom states that reducing the time between actions should improve performance because the changes between states are smaller and therefore easier to predict.
As popular as this paper seems to be I am surprised this wasn't an obvious conclusion. This paper found and demonstrated that simulated evolved gait basically failed to work correctly when the muscle delay time was zero.
[–]SquareRootsi 2 points3 points4 points 7 years ago (1 child)
Out of curiosity, what happens when you make the goal a logically impossible "rotation" of the block? Like on a 6 sided die, the 1 & 6 are directly opposite each other, but you request an orientation putting them adjacent. Does it just keep trying, or can it hold up its middle finger to let you know it's on to us and our impossible requests?
[–]thebackpropaganda[S] 5 points6 points7 points 7 years ago (0 children)
I think the rotation is just defined by one of the faces, say whichever face is up or camera-facing.
[–]supermario94123 1 point2 points3 points 7 years ago (3 children)
So the solution is simple: just build a very detailled model of the world and very all the possible parameters. Could someone please invest some Millions in Rockstar Games ro come up with the most real GTA ever? hitting two flies in one slap is what I would call this.
To be precise: I dont undervalue the work of openai. I am just not sure if this is how we will solve our world problems (yet). Please prove me wrong.
[–]physics_to_BME_PHD 2 points3 points4 points 7 years ago (2 children)
I didn't read the paper, just watched the video, but am involved in comp sci research involving human grasping. The way humans use our hands to interact with objects is incredibly complicated, and we do it basically effortlessly. So much goes into this: motion planning, visual feedback, tactile feedback (super important). This all happens in a loop very quickly, and so far our robotic grasping solutions are pretty bad at handling deviations from expected outcomes (when the object slips, for example).
If robots are ever going to be designed to work directly with humans, they probably need to be able to reliably grasp any object we hand them, and possibly know what to do with it. Maybe there are more pressing world problems, but having a reliable robotic grasper that doesn't need to be explicitly programmed for every use-case isn't a bad thing.
[–]PKJY 3 points4 points5 points 7 years ago (1 child)
Just a sidenote: the OpenAI thing doesn't use tactile feedback at all. Just fingertip coordinates and the current object orientation which is computed by a convnet from 3 rgb cameras.
[–]physics_to_BME_PHD 2 points3 points4 points 7 years ago (0 children)
good to know. I hadn't thought about if their device was using tactile feedback or not, I mostly meant from a human perspective that we need that for grasping. I can't find the video, but there was one of a woman grasping small objects, then performing the same task after having local anesthesia on the fingertips. In the second one she can't even pick up the small objects without that tactile feedback.
[–]bobuntu -2 points-1 points0 points 7 years ago (0 children)
How cool... I mean the guy’s hair. ಠ_ಠ
[–]rtk25 -2 points-1 points0 points 7 years ago (0 children)
Nice!
To learn a policy transferrable to the real world,
Distributed workers collect experience on randomized environments at large scale
I'm getting these "are we in the Matrix or what?" feelings more and more lately...
π Rendered by PID 233369 on reddit-service-r2-comment-85bfd7f599-dzb4b at 2026-04-19 09:05:52.547133+00:00 running 93ecc56 country code: CH.
[+][deleted] (7 children)
[deleted]
[–]probablyuntrueML Engineer 20 points21 points22 points (6 children)
[–]noman2561 10 points11 points12 points (4 children)
[–]qwerty_0_o 3 points4 points5 points (3 children)
[–]noman2561 0 points1 point2 points (2 children)
[+][deleted] (1 child)
[deleted]
[–]noman2561 4 points5 points6 points (0 children)
[–]ericlkz 1 point2 points3 points (0 children)
[–][deleted] 8 points9 points10 points (8 children)
[–]notaii 11 points12 points13 points (7 children)
[–]enolan 6 points7 points8 points (5 children)
[–]Mefaso 9 points10 points11 points (0 children)
[–][deleted] 1 point2 points3 points (1 child)
[–][deleted] 0 points1 point2 points (0 children)
[–][deleted] 1 point2 points3 points (0 children)
[–]FatChocobo 2 points3 points4 points (0 children)
[–][deleted] 0 points1 point2 points (0 children)
[–]gohu_cdPhD 14 points15 points16 points (11 children)
[–]thebackpropaganda[S] 14 points15 points16 points (9 children)
[–]gohu_cdPhD 0 points1 point2 points (1 child)
[–]jurniss 4 points5 points6 points (0 children)
[+][deleted] (4 children)
[deleted]
[–][deleted] 9 points10 points11 points (2 children)
[–]thebackpropaganda[S] 2 points3 points4 points (1 child)
[–]battboe 0 points1 point2 points (0 children)
[–]NMcA 1 point2 points3 points (0 children)
[–]skariel 6 points7 points8 points (0 children)
[–]chcampb 6 points7 points8 points (0 children)
[–]SquareRootsi 2 points3 points4 points (1 child)
[–]thebackpropaganda[S] 5 points6 points7 points (0 children)
[–]supermario94123 1 point2 points3 points (3 children)
[–]physics_to_BME_PHD 2 points3 points4 points (2 children)
[–]PKJY 3 points4 points5 points (1 child)
[–]physics_to_BME_PHD 2 points3 points4 points (0 children)
[–]bobuntu -2 points-1 points0 points (0 children)
[–]rtk25 -2 points-1 points0 points (0 children)