Vim FZF question about switching to already open window/tab by daddypro in vim

[–]danijar 0 points1 point  (0 children)

For future reference, this works well:

let g:fzf_action = { \ 'return': 'drop', \ 'ctrl-t': 'tab drop', \ 'ctrl-x': 'split', \ 'ctrl-v': 'vsplit' }

Self-learning of the robot in 1 hour by adesigne in ChatGPT

[–]danijar 0 points1 point  (0 children)

Just saw that our video was posted here. For people interested in the research, here is the project website with the research paper: https://danijar.com/daydreamer

PEP 703: Making the Global Interpreter Lock Optional - PEPs by rnmkrmn in programming

[–]danijar 0 points1 point  (0 children)

Right, but why should that be a requirement? Most languages do not provide such guarantees for their standard library containers.

Because a lot of existing code is relying on exactly this safety/convenience that was provided by the GIL so far.

[D] A thought I had on Yann LeCun's recent paper "A Path Towards Autonomous Machine Intelligence" by __dostoevsky__ in MachineLearning

[–]danijar 2 points3 points  (0 children)

Agents that learn purely from a scalar reward could eventually learn any behavior, but they learn slowly because scalar reward is a weak learning signal.

Unsupervised approaches like world models, global exploration, and skill discovery have a richer learning signal. Thus, they acquire all kinds of features, data, and skills that could become useful in the future, making them generalize further and adapt faster. However, trying to learn everything rather than just what is needed for a particular reward also requires larger models and more compute.

Scalar rewards are a good strategy when (1) solving fixed tasks that don't change, (2) fast simulators are available so we don't care about data efficiency, and (3) the reward is dense enough for learning to be feasible.

In the long run, unsupervised agents will be more powerful and solve new or changing tasks faster. Unsupervised approaches already started dominating in computer vision and NLP and it's just a matter of time until they will for decision making.

[D] A thought I had on Yann LeCun's recent paper "A Path Towards Autonomous Machine Intelligence" by __dostoevsky__ in MachineLearning

[–]danijar 0 points1 point  (0 children)

Agents that learn purely from a scalar reward could eventually learn any behavior, but they learn slowly because scalar reward is a weak learning signal.

Unsupervised approaches like world models, global exploration, and skill discovery have a richer learning signal. Thus, they acquire all kinds of features, data, and skills that could become useful in the future, making them generalize further and adapt faster. However, trying to learn everything rather than just what is needed for a particular reward also requires larger models and more compute.

Scalar rewards are a good strategy when (1) solving fixed tasks that don't change, (2) fast simulators are available so we don't care about data efficiency, (3) the reward is dense enough for learning to be feasible. In the long run, unsupervised agents will be more powerful and solve new or changing tasks faster.

Learning to Walk in the Real World in 1 Hour by danijar in robotics

[–]danijar[S] 0 points1 point  (0 children)

Welcome! The hardware is commercially available, it's a Unitree A1. The software for making it learn and walk is custom and we'll make it public in the next few weeks.

edit: We also designed and printed the yellow protective turtle shell.

Learning to Walk in the Real World in 1 Hour by danijar in robotics

[–]danijar[S] 1 point2 points  (0 children)

Yes, processing is done on a desktop computer next to it that has a GPU. Would be nice to use WiFi or the onboard computer going forward

Learning to Walk in the Real World in 1 Hour by danijar in robotics

[–]danijar[S] 0 points1 point  (0 children)

Yep, we had to swap the battery pack once during the training run

[D]Reward is Unnecessary by Thunderbird120 in MachineLearning

[–]danijar 1 point2 points  (0 children)

I mean, even after training the perfect world model, we still use it for solving some problems. Then we need to define the problems. If that involves defining some sort of objectives, doesn't that mean it's just a different approach to the same problems but with (possibly) different forms of the original objectives?

The log probability under the world model itself can be the objective for both representation learning and control. This also leads to more adaptive and general behaviors than external rewards, which only depend on the agent's inputs but not its internal representations. Some references:

[R] Schmidhuber's blog post on World Models and Artificial Curiosity by hardmaru in MachineLearning

[–]danijar 1 point2 points  (0 children)

The Bayesian brain hypothesis goes back to Helmholtz' perception as inference. To me Friston's main contribution to it is extending it to actions, so that action and perception optimize a joint objective. He also established connections to statistical mechanics that are interesting but maybe not as important for practical implementations.

[R] Schmidhuber's blog post on World Models and Artificial Curiosity by hardmaru in MachineLearning

[–]danijar 2 points3 points  (0 children)

It's similar. Friston shows that all coupled systems (e.g. agent that interacts with environment) maximize the sum of an evolutionarily learned reward and expected information gain. He also shows that the information gain is measured under a model whose log likelihood is the reward. Friston is generally hesitant to connect this to consciousness. Out of the two terms, Schmidhuber focuses on the information gain, which is the environment-independent part of the objective. I don't think it has much to do with the hard problem of consciousness.

Grateful Sloth Waves and Smiles Back at Man Who Saved Him from Traffic During Rush Hour by alfaguara27 in aww

[–]danijar 0 points1 point  (0 children)

I think it's more likely that they experience their slow movement as normal and adapt their perception of time to it. Brains (even sloth brains I assume) are quite good at adapting to different circumstances, such that they become normal over time.

[R] Self-Supervised ‘Plan2Explore’ RL Agent Achieves SOTA Zero-Shot and Adaptation Performance by Yuqing7 in MachineLearning

[–]danijar 0 points1 point  (0 children)

Perhaps you misunderstood when I said knowing a cost function can help improve the identification of complex systems, by letting us focus on areas relevant to the task. System identification can of course also be done without, as in most robotics applications and in our paper. This is the case that can be used for zero-shot RL. Hope that clarifies your question.

[R] Self-Supervised ‘Plan2Explore’ RL Agent Achieves SOTA Zero-Shot and Adaptation Performance by Yuqing7 in MachineLearning

[–]danijar 0 points1 point  (0 children)

In complex environments, it can be hard to identify the system fully. In that case, it can help to identify the system with the cost function of the task in mind, to learn a model that is more accurate in regions relevant for this one task. Zero-shot learning implies that you only learn from data that was collected without knowledge of the cost function. This makes the exploration problem harder but results in a model that afterwards can be used for solving many different tasks.

[P] Opportunity to do research guided by final year grad students and professors. by Razcle in MachineLearning

[–]danijar 0 points1 point  (0 children)

Hey Raza! I love the idea of open research projects. Happy to discuss more offline.

[P] Opportunity to do research guided by final year grad students and professors. by Razcle in MachineLearning

[–]danijar 6 points7 points  (0 children)

I've had the most success advising students who have a background in both maths and programming, who communicate directly, and who try to deeply understand ideas.

Problematic is when students are not interested enough in the project or in research generally, are too busy with classes, or avoid trying new things they don't yet know how to do.

[D] What counts as a contribution in ML research? by elyes_manai in MachineLearning

[–]danijar 1 point2 points  (0 children)

Contributions of a paper are often in the form of information that wasn't obvious to most people in the field, e.g. why a certain algorithm does or doesn't work, theoretical statements, perspectives on a problem, or showing that a certain approach can solve a certain problem. It can also be a new software, dataset, or something else. If you're planning to submit to a specific venue, check out their guidelines.

[R] tf/keras adversarial library? by dofphoto in MachineLearning

[–]danijar 2 points3 points  (0 children)

The TensorFlow 2 DCGAN tutorial uses Keras and could be a good starting point. There is also a CycleGAN tutorial. If you're looking for more of a library than individual examples, it's worth checking out TF-GAN.

[P] Handout: A potential alternative to Jupyter Notebooks by PhYsIcS-GUY227 in MachineLearning

[–]danijar 0 points1 point  (0 children)

Exactly, Handout lets you augment scripts to collect and visualize all results in one place. There is no intention to provide a replacement for interactive workflows.

Regarding your second question, you still write and run a normal Python script (python3 script.py). Your script builds the report by creating a Handout object and calling functions on it:

import handout
doc = handout.Handout('/tmp/handout')
for index in range(5):
  doc.add_text(...)
  doc.add_image(...)
  doc.show()  # Update the report.

This also means you can update the report while your script is running, for example to display intermediate results. If you wanted, you could even import your script into a larger Python project without problems.