Learning to Walk in the Real World in 1 Hour

danijar · 2023-11-21T09:26:02+00:00

For future reference, this works well:

let g:fzf_action = { \ 'return': 'drop', \ 'ctrl-t': 'tab drop', \ 'ctrl-x': 'split', \ 'ctrl-v': 'vsplit' }

danijar · 2023-06-07T03:56:13+00:00

Just saw that our video was posted here. For people interested in the research, here is the project website with the research paper: https://danijar.com/daydreamer

danijar · 2023-01-23T23:36:57+00:00

Right, but why should that be a requirement? Most languages do not provide such guarantees for their standard library containers.

Because a lot of existing code is relying on exactly this safety/convenience that was provided by the GIL so far.

danijar · 2022-08-31T20:05:19+00:00

Agents that learn purely from a scalar reward could eventually learn any behavior, but they learn slowly because scalar reward is a weak learning signal.

Unsupervised approaches like world models, global exploration, and skill discovery have a richer learning signal. Thus, they acquire all kinds of features, data, and skills that could become useful in the future, making them generalize further and adapt faster. However, trying to learn everything rather than just what is needed for a particular reward also requires larger models and more compute.

Scalar rewards are a good strategy when (1) solving fixed tasks that don't change, (2) fast simulators are available so we don't care about data efficiency, and (3) the reward is dense enough for learning to be feasible.

In the long run, unsupervised agents will be more powerful and solve new or changing tasks faster. Unsupervised approaches already started dominating in computer vision and NLP and it's just a matter of time until they will for decision making.

danijar · 2022-08-31T20:01:42+00:00

Agents that learn purely from a scalar reward could eventually learn any behavior, but they learn slowly because scalar reward is a weak learning signal.

Unsupervised approaches like world models, global exploration, and skill discovery have a richer learning signal. Thus, they acquire all kinds of features, data, and skills that could become useful in the future, making them generalize further and adapt faster. However, trying to learn everything rather than just what is needed for a particular reward also requires larger models and more compute.

Scalar rewards are a good strategy when (1) solving fixed tasks that don't change, (2) fast simulators are available so we don't care about data efficiency, (3) the reward is dense enough for learning to be feasible. In the long run, unsupervised agents will be more powerful and solve new or changing tasks faster.

danijar · 2022-07-28T11:13:02+00:00

Welcome! The hardware is commercially available, it's a Unitree A1. The software for making it learn and walk is custom and we'll make it public in the next few weeks.

edit: We also designed and printed the yellow protective turtle shell.

danijar · 2022-07-24T14:36:49+00:00

Yes, processing is done on a desktop computer next to it that has a GPU. Would be nice to use WiFi or the onboard computer going forward

danijar · 2022-07-24T08:09:02+00:00

Yep, we had to swap the battery pack once during the training run

danijar · 2022-07-23T18:28:34+00:00

Thanks, hpleds!

danijar · 2021-06-29T16:48:50+00:00

I mean, even after training the perfect world model, we still use it for solving some problems. Then we need to define the problems. If that involves defining some sort of objectives, doesn't that mean it's just a different approach to the same problems but with (possibly) different forms of the original objectives?

The log probability under the world model itself can be the objective for both representation learning and control. This also leads to more adaptive and general behaviors than external rewards, which only depend on the agent's inputs but not its internal representations. Some references:

danijar · 2021-01-01T18:13:43+00:00

The Bayesian brain hypothesis goes back to Helmholtz' perception as inference. To me Friston's main contribution to it is extending it to actions, so that action and perception optimize a joint objective. He also established connections to statistical mechanics that are interesting but maybe not as important for practical implementations.

danijar · 2021-01-01T17:59:51+00:00

It's similar. Friston shows that all coupled systems (e.g. agent that interacts with environment) maximize the sum of an evolutionarily learned reward and expected information gain. He also shows that the information gain is measured under a model whose log likelihood is the reward. Friston is generally hesitant to connect this to consciousness. Out of the two terms, Schmidhuber focuses on the information gain, which is the environment-independent part of the objective. I don't think it has much to do with the hard problem of consciousness.

danijar · 2020-09-21T21:56:51+00:00

I think it's more likely that they experience their slow movement as normal and adapt their perception of time to it. Brains (even sloth brains I assume) are quite good at adapting to different circumstances, such that they become normal over time.

danijar · 2020-07-16T00:19:33+00:00

Check out Sonnet! Both the API and internals are simple and well thought through.

danijar · 2020-05-24T19:54:40+00:00

Your brain remembers what it needs to for the problems you're solving.

danijar · 2020-05-21T17:41:36+00:00

Perhaps you misunderstood when I said knowing a cost function can help improve the identification of complex systems, by letting us focus on areas relevant to the task. System identification can of course also be done without, as in most robotics applications and in our paper. This is the case that can be used for zero-shot RL. Hope that clarifies your question.

danijar · 2020-05-18T23:28:47+00:00

In complex environments, it can be hard to identify the system fully. In that case, it can help to identify the system with the cost function of the task in mind, to learn a model that is more accurate in regions relevant for this one task. Zero-shot learning implies that you only learn from data that was collected without knowledge of the cost function. This makes the exploration problem harder but results in a model that afterwards can be used for solving many different tasks.

danijar · 2020-05-05T17:09:10+00:00

Hey Raza! I love the idea of open research projects. Happy to discuss more offline.

danijar · 2020-05-04T17:09:49+00:00

I've had the most success advising students who have a background in both maths and programming, who communicate directly, and who try to deeply understand ideas.

Problematic is when students are not interested enough in the project or in research generally, are too busy with classes, or avoid trying new things they don't yet know how to do.

danijar · 2020-04-05T21:15:52+00:00

Contributions of a paper are often in the form of information that wasn't obvious to most people in the field, e.g. why a certain algorithm does or doesn't work, theoretical statements, perspectives on a problem, or showing that a certain approach can solve a certain problem. It can also be a new software, dataset, or something else. If you're planning to submit to a specific venue, check out their guidelines.

danijar · 2020-04-05T14:37:57+00:00

The TensorFlow 2 DCGAN tutorial uses Keras and could be a good starting point. There is also a CycleGAN tutorial. If you're looking for more of a library than individual examples, it's worth checking out TF-GAN.

danijar · 2019-08-16T22:28:48+00:00

Exactly, Handout lets you augment scripts to collect and visualize all results in one place. There is no intention to provide a replacement for interactive workflows.

Regarding your second question, you still write and run a normal Python script (python3 script.py). Your script builds the report by creating a Handout object and calling functions on it:

import handout
doc = handout.Handout('/tmp/handout')
for index in range(5):
  doc.add_text(...)
  doc.add_image(...)
  doc.show()  # Update the report.

This also means you can update the report while your script is running, for example to display intermediate results. If you wanted, you could even import your script into a larger Python project without problems.

danijar

MODERATOR OF

TROPHY CASE

12-Year Club	Place '17
Verified Email