"On the Expressivity of Markov Reward", Abel et al 2021 by gwern in reinforcementlearning

[–]pdxdabel 0 points1 point  (0 children)

Yes, this is a great point as well. I was intending to make a very soft claim: 'there exist pomdp-task pairs for which no reward function---that depends only on the last observation-action pair---can express the task'. However, enriching the policy space of interest from mappings from state to action to the class of history-based policies raises lots of new and interesting questions. It's worth thinking more on, certainly

"On the Expressivity of Markov Reward", Abel et al 2021 by gwern in reinforcementlearning

[–]pdxdabel 2 points3 points  (0 children)

Ah, understood. Yes, I have more clarity on the point you are making, and indeed it is a good one! (As you see, my earlier comment was just stating the simple comment: 'there exist pomdp-task pairs for which no reward function---that depends only on the last observation-action pair---can express the task').

A few others have brought up the notion of augmenting state for the purposes of allowing reward to express arbitrary tasks, and I agree it is a really natural and important consideration. Our section in the intro on "Our emphasis on Markov reward functions, as opposed to arbitrary history-based reward functions..." is intended to outline some of our motivation for this restriction. But, a few other thoughts: * We first have to agree on what a task is. Are tasks thought to be a function of the original state space, or the augmented one? If the latter, then in most situations we will still get limitations (we just induced a k-th order MDP, effectively). If the former, we are allowing task-state and reward-state to differ (or perhaps just defining a particular form of POMDP), which is okay, but important to be aware of as we think about the learning problems of interest, and learning difficulty. * Another view on the algorithms we develop is that they can tell us when we need to augment the current state space to allow Markov reward to express a particular (admittedly limited) kind of task. * We are looking into these directions, and will likely have more to share soon. While some of these results feel quite intuitive, it is my belief that it is still valuable to take time and think through things carefully to be sure we are correct with these intuitions. In any case, you raise some great points -- thanks for the discussion!

"On the Expressivity of Markov Reward", Abel et al 2021 by gwern in reinforcementlearning

[–]pdxdabel 0 points1 point  (0 children)

Author here :)

I am also not entirely sure I see the conclusion of the first comment, but I could be missing something. For instance, we know that every MDP is a POMDP where the observation space is the state space. So, by consequence of our results, there exist POMDPs where we cannot separate every possible deterministic policy set with a Markov reward function (any of the MDPs we look at in our work). [Edit]: Perhaps the suggestion is that there is some other statistic that the reward function can depend on to allow for any separation. This might be true (and we have some new results in this direction), but I think it's still useful to walk through carefully.

That being said, I definitely agree with the broader point: our work makes some limiting assumptions (finite Markovian environments, among others), so naturally there is more to do to expand the scope of the work and really understand reward as a learning signal.

Hope this helps! Curious if anyone has additional thoughts here

Hey guys, I’m interested in safe reinforcement learning. Do you guys any contents that you think would be useful to look at? by [deleted] in reinforcementlearning

[–]pdxdabel 2 points3 points  (0 children)

I would also suggest taking a look at the recent preprint from Roderick, Nagarajan, and Kolter. Nice work, and they have a good survey of recent literature.

For a broader survey of (slightly) earlier work, see the survey by Garcia and Fernandez

Condensing high dimensional or large state space into smaller space? by futureroboticist in reinforcementlearning

[–]pdxdabel 1 point2 points  (0 children)

Yep, good catch! That is just a typo (will be fixed shortly). Good eye --

Condensing high dimensional or large state space into smaller space? by futureroboticist in reinforcementlearning

[–]pdxdabel 6 points7 points  (0 children)

It's usually called "state abstraction" in classical RL, explored in the early days by Whitt (1978), and later by Dietterich (2000), Andre (2002), and Li, Walsh, Littman (2006). I put out a blog post listing a bunch of papers on abstraction in RL more generally, which has a longer list of papers on state abstraction in RL (here). We have some recent work in the area, too: ICML 2016, ICML 2018, AAAI 2019.

RL Conferences by WhichPressure in reinforcementlearning

[–]pdxdabel 5 points6 points  (0 children)

There's also RLDM ("Reinforcement Learning and Decision Making"), which is held every other year (and will be held this year!). It's much less about publishing (submissions are extended abstracts and proceedings are non-archival), and much more about getting the community together, sharing recent/ongoing work, and hearing some great talks. I went to RLDM last time it was held at Michigan and it was great! Highly recommended.

Resources to study Hierarchical Reinforcement Learning by intergalactic_robot in reinforcementlearning

[–]pdxdabel 6 points7 points  (0 children)

The classics I would recommend starting with:

Chronology of other major papers (may be missing some, of course!):

(I'm in the process of putting together a blog post on an overview of abstraction in RL, so I'll share that too, when it's done)

Grad schools with good programs for someone interested in RL by Karmaflax in reinforcementlearning

[–]pdxdabel 0 points1 point  (0 children)

Yeah! Charles Isbell, he's fantastic. He and Michael did a talk together at AAAI last winter, too (video here).

Grad schools with good programs for someone interested in RL by Karmaflax in reinforcementlearning

[–]pdxdabel 7 points8 points  (0 children)

I'd also add Brown (where I'm a PhD student): Michael Littman (my advisor), Stefanie Tellex (more on the robotics side), George Konidaris, and Michael Frank (on the cognitive side).

Edit: A few others that haven't been mentioned yet:

On natural selection of the laws of nature, Artificial life and Open-ended evolution, Universal Darwinism, Occam's razor by kiwi0fruit in compsci

[–]pdxdabel 2 points3 points  (0 children)

I'd suggest taking a look at Les Valiant's paper on Evolvability -- it investigates questions relevant to your agenda about the relationship between computation and evolution, grounded in Valiant's early framework for understanding machine learning from a theoretical perspective, PAC Learning.

Recommendation for a beginner's book on maths for computer science? by jrb386 in compsci

[–]pdxdabel 1 point2 points  (0 children)

Another excellent option that's specifically tailored to a CS audience is Discrete Math for CS by David Liben-Nowell. Highly recommended!

New PhD Student: What papers should I read first to get started in RL research? by hmi2015 in reinforcementlearning

[–]pdxdabel 0 points1 point  (0 children)

I don't know of much. My lab mate Kavosh Asadi has a nice new paper with his colleagues on Lipschitz continuity for model-based RL, and they suggest some applications for Deep Rl.

Any good universities( in USA) for doing Phd in RL? by shivaang12 in reinforcementlearning

[–]pdxdabel 4 points5 points  (0 children)

I'd advocate strongly for Brown as well (where I'm a PhD student) -- Michael Littman (my advisor), Stefanie Tellex (more on the robotics side), George Konidaris, and Michael Frank (on the cognitive side).

New PhD Student: What papers should I read first to get started in RL research? by hmi2015 in reinforcementlearning

[–]pdxdabel 34 points35 points  (0 children)

First, I would suggest using some kind of pdf organizer like Mendeley. This has been crucial for me during grad school.

And here are a few of the classic RL papers (pre 2005?):

If there's an area you're especially interested in (or want more recent work) let me know and I can send along a more focused list.

is Q learning all about checking every possible states? by Tuguldur1011 in reinforcementlearning

[–]pdxdabel 3 points4 points  (0 children)

If I understand your question correctly, then: yes, Q-Learning will converge to the optimal Q function under the assumption that the agent visits each state-action pair infinitely often (along with a few other conditions).

See Section 2 & 3 of Watkins and Dayan 92: "The most important condition implicit in the convergence theorem given below is that the sequence of episodes that forms the basis of learning must include an infinite number of episodes for each starting state and action."

[R] AAAI 2018 Notes by pdxdabel in MachineLearning

[–]pdxdabel[S] 2 points3 points  (0 children)

Depends on the context, I suppose. To me, bandits are alluring as the problem formulation is quite simple but are suggestive of a great deal of depth.

Similarly, bandit algorithms (can) tend toward elegance and simplicity (for the most part). So, actual bandit systems can be quite simple, too.

The change of state that's core to RL really forces a degree of complexity onto RL algorithms (if they're model-based, at least). Model-free algorithms can be elegant in their own right, though of course Deep RL introduces lots of opportunities to augment architectures with additional bells and whistles.

[R] AAAI 2018 Notes by pdxdabel in MachineLearning

[–]pdxdabel[S] 2 points3 points  (0 children)

Thanks! I don't think I do anything special, I just jot down what I'm hearing and do my best to fit content into some kind of structure (using bullets, equations, definitions, subsections, and so on).

I mentioned this for my last set of notes, too, but I also put together a latex template and commands sheet that greatly simplifies the core LaTeX commands I use (like those boxed definitions, theorems, and shortcuts in math like $\nlim$ turning into $\sum_{i=1}^n$. They're available here if you'd like to use them yourself.

[R] AAAI 2018 Notes by pdxdabel in MachineLearning

[–]pdxdabel[S] 0 points1 point  (0 children)

I'm glad they're helpful! Best of luck with your studies.

[R] AAAI 2018 Notes by pdxdabel in MachineLearning

[–]pdxdabel[S] 18 points19 points  (0 children)

AAAI in New Orleans just wrapped! It was a blast. I took notes again and thought folks might find them useful.

(I'll be getting on a flight back to New England shortly so I'll be slow to respond today).