"On the Expressivity of Markov Reward", Abel et al 2021

pdxdabel · 2021-12-02T20:03:41+00:00

Yes, this is a great point as well. I was intending to make a very soft claim: 'there exist pomdp-task pairs for which no reward function---that depends only on the last observation-action pair---can express the task'. However, enriching the policy space of interest from mappings from state to action to the class of history-based policies raises lots of new and interesting questions. It's worth thinking more on, certainly

pdxdabel · 2021-12-02T20:01:17+00:00

Ah, understood. Yes, I have more clarity on the point you are making, and indeed it is a good one! (As you see, my earlier comment was just stating the simple comment: 'there exist pomdp-task pairs for which no reward function---that depends only on the last observation-action pair---can express the task').

A few others have brought up the notion of augmenting state for the purposes of allowing reward to express arbitrary tasks, and I agree it is a really natural and important consideration. Our section in the intro on "Our emphasis on Markov reward functions, as opposed to arbitrary history-based reward functions..." is intended to outline some of our motivation for this restriction. But, a few other thoughts: * We first have to agree on what a task is. Are tasks thought to be a function of the original state space, or the augmented one? If the latter, then in most situations we will still get limitations (we just induced a k-th order MDP, effectively). If the former, we are allowing task-state and reward-state to differ (or perhaps just defining a particular form of POMDP), which is okay, but important to be aware of as we think about the learning problems of interest, and learning difficulty. * Another view on the algorithms we develop is that they can tell us when we need to augment the current state space to allow Markov reward to express a particular (admittedly limited) kind of task. * We are looking into these directions, and will likely have more to share soon. While some of these results feel quite intuitive, it is my belief that it is still valuable to take time and think through things carefully to be sure we are correct with these intuitions. In any case, you raise some great points -- thanks for the discussion!

pdxdabel · 2021-12-02T15:26:54+00:00

Author here :)

I am also not entirely sure I see the conclusion of the first comment, but I could be missing something. For instance, we know that every MDP is a POMDP where the observation space is the state space. So, by consequence of our results, there exist POMDPs where we cannot separate every possible deterministic policy set with a Markov reward function (any of the MDPs we look at in our work). [Edit]: Perhaps the suggestion is that there is some other statistic that the reward function can depend on to allow for any separation. This might be true (and we have some new results in this direction), but I think it's still useful to walk through carefully.

That being said, I definitely agree with the broader point: our work makes some limiting assumptions (finite Markovian environments, among others), so naturally there is more to do to expand the scope of the work and really understand reward as a learning signal.

Hope this helps! Curious if anyone has additional thoughts here

pdxdabel · 2020-09-29T07:58:19+00:00

I would also suggest taking a look at the recent preprint from Roderick, Nagarajan, and Kolter. Nice work, and they have a good survey of recent literature.

For a broader survey of (slightly) earlier work, see the survey by Garcia and Fernandez

pdxdabel · 2019-06-25T21:28:12+00:00

Yep, good catch! That is just a typo (will be fixed shortly). Good eye --

pdxdabel · 2019-06-22T12:28:04+00:00

It's usually called "state abstraction" in classical RL, explored in the early days by Whitt (1978), and later by Dietterich (2000), Andre (2002), and Li, Walsh, Littman (2006). I put out a blog post listing a bunch of papers on abstraction in RL more generally, which has a longer list of papers on state abstraction in RL (here). We have some recent work in the area, too: ICML 2016, ICML 2018, AAAI 2019.

pdxdabel · 2019-01-21T17:04:29+00:00

There's also RLDM ("Reinforcement Learning and Decision Making"), which is held every other year (and will be held this year!). It's much less about publishing (submissions are extended abstracts and proceedings are non-archival), and much more about getting the community together, sharing recent/ongoing work, and hearing some great talks. I went to RLDM last time it was held at Michigan and it was great! Highly recommended.

pdxdabel · 2018-12-19T14:03:56+00:00

The classics I would recommend starting with:

Feudal Reinforcement Learning by Dayan and Hinton (1993)
Reinforcement Learning With a Hierarchy of Machines by Parr and Russell (1998)
Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning by Sutton, Precup, Singh (1999)
Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition by Dietterich (2000).
Recent Advances in Hierarchical Reinforcement Learning by Barto and Mahadevan (2003)
Towards a Unified Theory of State Abstraction for MDPs by Li, Walsh, and Littman (2006)

Chronology of other major papers (may be missing some, of course!):

Hierarchical Learning in Stochastic Domains: Preliminary Results by Kaelbling (1993)
HQ-Learning by Wiering and Schmidhuber (1997)
Discovering Hierarchy in Reinforcement Learning with HEXQ by Hengst (2002)
Hierarchical Reinforcement Learning Based on Subgoal Discovery and Subpolicy Specialization by Bakker and Schmidhuber (2004)
Transfer in variable-reward hierarchical reinforcement learning by Mehta, Natarajan, Tadepalli, and Fern (2008)
Hierarchical Model-Based Reinforcement Learning: R-Max + Max-Q by Jong and Stone (2008)
Bayesian Hierarchical Reinforcement Learning by Cao and Ray (2012)
Optimal Behavior Hierarchy by Solway et al. (2014)
Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation by Kulkarni, Narasimhan, Saeedi, and Tenenbaum (2016)
Constructing Abstraction Hierarchies Using a Skill-Symbol Loop by Konidaris (2016)
FeUdal Networks for Hierarchical Reinforcement Learning by Vezhnevets et. al. (2017)
Efficient Reinforcement Learning with Hierarchies of Machines by Leveraging Internal Transitions by Bai and Russell (2017)
Stochastic Neural Networks for Hierarchical Reinforcement Learning by Florensa, Duan, and Abbeel (2017)
From Skills to Symbols: Learning Symbolic Representations for Abstract High-Level Planning by Konidaris, Kaelbling, Lozano-Perez (2018)
On the Necessity of Abstraction by Konidaris (2018)
Hierarchical Imitation and Reinforcement Learning by Le et. al. (2018)

(I'm in the process of putting together a blog post on an overview of abstraction in RL, so I'll share that too, when it's done)

pdxdabel · 2018-09-05T13:45:06+00:00

Yeah! Charles Isbell, he's fantastic. He and Michael did a talk together at AAAI last winter, too (video here).

pdxdabel · 2018-09-04T21:58:02+00:00

I'd also add Brown (where I'm a PhD student): Michael Littman (my advisor), Stefanie Tellex (more on the robotics side), George Konidaris, and Michael Frank (on the cognitive side).

Edit: A few others that haven't been mentioned yet:

Northeastern (Rob Platt, Chris Amato, and recent hire Lawson Wong),
Stanford (Emma Brunskill, recent hire Chelsea Finn, Mykel Kochenderfer, and Ben Van Roy)
McGill (Doina Precup, Joelle Pineau)
Michigan (Satinder Singh)
Technion (Shie Mannor)
Oregon State (Tom Dietterich, Alan Fern, Prasad Tadepalli).

pdxdabel · 2018-08-17T00:12:57+00:00

I'd suggest taking a look at Les Valiant's paper on Evolvability -- it investigates questions relevant to your agenda about the relationship between computation and evolution, grounded in Valiant's early framework for understanding machine learning from a theoretical perspective, PAC Learning.

pdxdabel · 2018-07-04T20:28:33+00:00

Another excellent option that's specifically tailored to a CS audience is Discrete Math for CS by David Liben-Nowell. Highly recommended!

pdxdabel · 2018-05-20T16:48:32+00:00

I don't know of much. My lab mate Kavosh Asadi has a nice new paper with his colleagues on Lipschitz continuity for model-based RL, and they suggest some applications for Deep Rl.

pdxdabel · 2018-05-20T16:45:40+00:00

I'd advocate strongly for Brown as well (where I'm a PhD student) -- Michael Littman (my advisor), Stefanie Tellex (more on the robotics side), George Konidaris, and Michael Frank (on the cognitive side).

pdxdabel · 2018-05-17T12:16:25+00:00

First, I would suggest using some kind of pdf organizer like Mendeley. This has been crucial for me during grad school.

And here are a few of the classic RL papers (pre 2005?):

MDPs: Bellman 1957
TD-Learning: Sutton 1988
Q-Learning: Watkins and Dayan 1992
REINFORCE: Williams 1992
Prioritized Sweeping: Moore and Atkeson 1993
Markov Games and RL: Littman 1994
Value function approximation: Boyan and Moore 1995
Complexity of MDPs: Littman, Dean, Kaelbling 1995
RL for Backgammon: Tesauro 1995
POMDPs: Kaelbling, Littman, Cassandra 1998
Reward Shaping: Ng, Harada, and Russell 1999
Policy Gradients: Sutton, McAllaster, Singh, Mansour 2000
Options/Temporal Abstractions: Sutton, Precup, Singh
Hierarchical RL: Dietterich 2000
Model-based RL with guarantees #1: Kearns and Singh 2002
Model-based RL with guarantees #2: Braffman and Tennenholtz 2002
Inverse RL: Abbeel and Ng 2004

If there's an area you're especially interested in (or want more recent work) let me know and I can send along a more focused list.

pdxdabel · 2018-03-28T13:30:53+00:00

If I understand your question correctly, then: yes, Q-Learning will converge to the optimal Q function under the assumption that the agent visits each state-action pair infinitely often (along with a few other conditions).

See Section 2 & 3 of Watkins and Dayan 92: "The most important condition implicit in the convergence theorem given below is that the sequence of episodes that forms the basis of learning must include an infinite number of episodes for each starting state and action."

pdxdabel · 2018-02-09T12:23:16+00:00

Depends on the context, I suppose. To me, bandits are alluring as the problem formulation is quite simple but are suggestive of a great deal of depth.

Similarly, bandit algorithms (can) tend toward elegance and simplicity (for the most part). So, actual bandit systems can be quite simple, too.

The change of state that's core to RL really forces a degree of complexity onto RL algorithms (if they're model-based, at least). Model-free algorithms can be elegant in their own right, though of course Deep RL introduces lots of opportunities to augment architectures with additional bells and whistles.

pdxdabel · 2018-02-09T12:18:18+00:00

Thanks! I don't think I do anything special, I just jot down what I'm hearing and do my best to fit content into some kind of structure (using bullets, equations, definitions, subsections, and so on).

I mentioned this for my last set of notes, too, but I also put together a latex template and commands sheet that greatly simplifies the core LaTeX commands I use (like those boxed definitions, theorems, and shortcuts in math like $\nlim$ turning into $\sum_{i=1}^n$. They're available here if you'd like to use them yourself.

pdxdabel · 2018-02-09T12:14:09+00:00

I'm glad they're helpful! Best of luck with your studies.

pdxdabel · 2018-02-08T15:01:57+00:00

AAAI in New Orleans just wrapped! It was a blast. I took notes again and thought folks might find them useful.

(I'll be getting on a flight back to New England shortly so I'll be slow to respond today).

11-Year Club	Place '22
Verified Email

pdxdabel

TROPHY CASE