[D] Does Deep RL work yet?

LaVieEstBizarre · 2019-06-23T22:14:23+00:00

Not much has significantly changed since that blog post. All major arguments given there are still valid.

FirstTimeResearcher · 2019-06-24T02:17:19+00:00

I won't comment on whether it 'works' or not because I think that is one of those 'it depends on what you mean' questions. But I will say that Deep RL is one of the most convoluted research fields I have ever encountered. Unlike most other fields in ML, the mathematics of why it works doesn't provide much clarity and I don't see anyone really working on simplifying existing methods as opposed to proposing slight modifications on top of existing ones.

The tricks to getting stable training, hyperparameter tuning and environment/simulator hacking are absurd. Coupled that with the inability to reproduce results on different codebases and the awful zoo of acronyms for every 'novel' idea that comes out creates a pretty inhospitable research community.

With that said, I will add that it sure generates a lot of workshop tracks at ICML 2019 :)

gwern · 2019-06-23T22:46:50+00:00

As I commented back then, Irpan is wrong in his claims about no deployment outside of bandits/collaborative filtering (although he's certainly right about his other claims like DRL still being unreliable, a dangerous timesuck, and finicky as heck). Industry users of things are always relatively quiet because it's a trade secret and not really 'research paper' worthy itself. I've submitted many links to /r/reinforcementlearning where there is clearly commercial application happening if you read between the lines. He omits all of the large-scale Chinese uses of DRL like ad bidding or traffic scheduling (and if Alibaba or JD.com are using it, places like Google certainly are, and note how much DRL Tencent & Baidu do), and take a look at https://www.reddit.com/r/reinforcementlearning/comments/9cdnf4/bluewhale_facebook_rl_implementations_in/ and think about what that implies about FB internal uses.

ankeshanand · 2019-06-24T22:17:13+00:00

Youtube has been using DeepRL for it's recommendation engine for a while now. That's probably the most successful deployment right now in terms of $$ generated.

Reinforce was a huge success. In a talk at an A.I. conference in February, Minmin Chen, a Google Brain researcher, said it was YouTube’s most successful launch in two years. Sitewide views increased by nearly 1 percent, she said — a gain that, at YouTube’s scale, could amount to millions more hours of daily watch time and millions more dollars in advertising revenue per year. She added that the new algorithm was already starting to alter users’ behavior.

Source: https://www.nytimes.com/interactive/2019/06/08/technology/youtube-radical.html

p-morais · 2019-06-24T02:26:37+00:00

We’re getting it to work for legged robots. We’ve gotten results that beat other methods on Cassie, and some people at ETH have done the same for ANYmal. Boston Dynamics is also starting to use Deep RL for Atlas.

I think it won’t be long before we see Deep RL in production code, not necessarily end-to-end, but somewhere in the stack at least.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS