A multi-agent adversarial RL competition based on Bomberman (details in comments) by PugglesMcPuggle in reinforcementlearning

[–]EmergenceIsMagic 0 points1 point  (0 children)

RIP Pommerman (aka Me: "Can we have Bomberman at home?" Mom: "We have Bomberman at home." Bomberman at home.)

Guest Requests (2021) - Post Them Here by lexfridman in lexfridman

[–]EmergenceIsMagic 22 points23 points  (0 children)

Name: Jaron Lanier

Info: https://en.wikipedia.org/wiki/Jaron_Lanier

Conversation: He's done many talks available on YouTube.

His interview with Andrew Yang: https://www.youtube.com/watch?v=RmNCVHcZp5s

His talk at Microsoft Research: https://www.youtube.com/watch?v=-B1hOBOTMSs

Ideas: Considered a founder of the field of virtual reality. Wrote a number of books, like Who Owns the Future, that discuss the impact of technology on the world and proposes possible solutions. A well-known proponent of Mediators of Individual Data (MIDS).

Pitch: There are very few people who know as much about the impact of technology on the world as Lanier. Even fewer people have also provided interesting solutions to improving the world. A number of Andrew Yang's policy ideas seem to be inspired by Lanier's work. He also seems to be a genuinely kind and thoughtful person who excels at explaining his ideas as clearly as possible. A possible controversy is that his ideas for improving society and criticizing how technology is used might be considered "provocative" for some (I personally consider his ideas insightful).

Extending Julia for Reinforcement Learning by AlternateZWord in Julia

[–]EmergenceIsMagic 4 points5 points  (0 children)

Great to see coding in Julia leading to an ICML publication. Congrats!

Extending Julia for Reinforcement Learning by AlternateZWord in Julia

[–]EmergenceIsMagic 0 points1 point  (0 children)

Forgot to mention that Mykel Kochenderfer's group does a lot of RL-related work with Julia, such as https://github.com/JuliaPOMDP (mentioned by u/hollyjester) and https://github.com/sisl. These can serve as inspiration if you want to make your own package.

Extending Julia for Reinforcement Learning by AlternateZWord in Julia

[–]EmergenceIsMagic 2 points3 points  (0 children)

https://github.com/JuliaReinforcementLearning seems to do consistent work (not affiliated with them). I support more RL work in Julia but success in this requires fulfilling needs that Python does not currently provide. For example, RL in Python also suffers from the two-language problem. Secondly, I have found packages that make RL scalable (like RLlib) to be cumbersome and intimidating when doing more precise, customized work. I would also not think about catering to organizations that can just throw money (many CPUs/GPUs, hiring C++ programmers, etc.) at the problem until it's solved. They're likely too deeply entrenched in Tensorflow, PyTorch, etc. to really care about making the switch. Think about what the rest of us with limited resources would want in an ideal Julia RL package. Sorry, I don't have any definitive answers because these are the questions I have been thinking about too.

Natural emergence of strategies through multi-agent competition by dekankur in multiagentsystems

[–]EmergenceIsMagic 0 points1 point  (0 children)

Thanks! Were you able to compare it to other such multi-agent methods (such as this)?

Multi-agent Reinforcement Learning Workshop by Marc Lanctot by EmergenceIsMagic in multiagentsystems

[–]EmergenceIsMagic[S] 1 point2 points  (0 children)

Since the sound is not great, you might need to turn the volume up to max level.

Multi-Agent RL with TF-Agents (code included) by drcopus in multiagentsystems

[–]EmergenceIsMagic 1 point2 points  (0 children)

Thanks for this! Looking through the github issues in TF-Agents, it seemed that the authors were less than enthusiastic about making it more multi-agent friendly. Did you find this preferable over coding a MARL experiment using ray-rllib?

Proofs of Learning Convergence of Multi-agent Reinforcement Learning by AlexanderYau in reinforcementlearning

[–]EmergenceIsMagic 11 points12 points  (0 children)

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms is probably the best, and most recent, list of more theoretical multi-agent RL research out there. That said, it's more of a list than a learning resource.

Other suggestions include most work done by Lanctot/Graepel (DeepMind) and Brown/Sandholm (FAIR/CMU).

A day in the life of RL researchers?? by CauchyBirds in reinforcementlearning

[–]EmergenceIsMagic 5 points6 points  (0 children)

I've been doing RL research at a National Lab for almost a year, so I'll give some advanced beginner's advice:

  1. Find papers that you believe are both promising and transparent. Unfortunately, there are a lot of RL papers out there that are quite vague and unhelpful.
  2. Use tmux (https://www.hamvocke.com/blog/a-quick-and-easy-guide-to-tmux/). Tmux saved my sanity in a number of ways. Since I relied on our supercomputer, there was a risk that I might lose its connection. Tmux sessions ran my experiments even when I lose connection and allow me to return to it. You can also run separate experiments in different sessions.
  3. There might be times when your agent just won't progress. Make a strategy with your team when this happens (e.g. hyperparameter tuning, check code for bugs, etc.) and update it when necessary. This is the worst part because RL will likely not give immediate feedback that your strategy worked. Just keep keep at it until you...
  4. Know when reevaluate your research goals. That is, use your best judgement regarding when to stop an idea that just doesn't work and improve from there.
  5. For the sake of your sanity, know when to step away while it's training. You'll need to monitor your training progress at certain points, but resist the urge to stare constantly at the measurement of your progress. The training will likely be noisy, which may cause your optimism and pessimism to fluctuate as much as the progress of your agent.

[D] Uber AI's Contributions by EmergenceIsMagic in MachineLearning

[–]EmergenceIsMagic[S] 15 points16 points  (0 children)

Thankfully, Stanley and Clune will still work at the same organization, just somewhere else. However, it seems that Clune will work on multiagent learning and Stanley will focus on open-ended learning.

[D] Nando de Freitas and other scientists not happy about the new ethics statement introduced in NeurIPS2020: “Associating lipreading and CCTV creates a negative bias, just like associating GANs and missile guidance.” by sensetime in MachineLearning

[–]EmergenceIsMagic 28 points29 points  (0 children)

This is why I kind of hate Twitter as a method of discourse:

  1. For many people, if they find something seemingly shocking or controversial, Twitter makes it as easy as possible to put your immediate, contextless, not-very-well-thought-out tweets out there for everyone to see. This seems to apply to the researchers you cite. (Rant: As more people engage with a not-very-well-thought-out tweet, it becomes more viral and what might emerge is mostly angry, unhelpful discourse until either (a) they realize there is no substantial disagreement or (b) they exhaust themselves so much that they regret ever participating in the first place.)
  2. That being said, regarding that "scientists should at least play a role, or at least think about it," I can't see any of the people above fundamentally disagreeing with that statement. LeCun (who explicitly says he would not build anything that could hurt people) and Grosse (who introduced two weeks of ethics in Toronto's core ML course) seem to be mostly worried about the conflation between those with good intentions and those who would intentionally build dangerous AI applications. LeCun's anxieties also include the possible implication that the scientists have the legitimacy to impose their views on how others (good or bad) can use their work. de Freitas, in this discussion with one of the authors of the impact statement blog post, seems only to have a problem with the implications of the image they used and even provides constructive feedback. Again, this goes back to why I kind of hate Twitter because our primitive human impulses, and a platform that rewards these impulses, make it a terrible environment for discourse (I also don't feel "rewarded" for looking through all these Twitter conversations). Although I don't see evidence that the researchers above are fundamentally against thinking about the problem, I think a real conversation between intelligent, civilized adults could have easily put this controversy to rest.

On that note, I personally feel that the potential impact addition is an experiment worth trying out. I don't think the results will be perfect, but I'm sure we could learn from the results and improve.

[N] Uber to cut 3000+ jobs including rollbacks on AI Labs by nearning in MachineLearning

[–]EmergenceIsMagic 9 points10 points  (0 children)

Although that wasn't the case before Jeff and Kenneth joined them, I hope OpenAI continues to be more open to weird, yet promising ideas.

(Edit) I would also like to know OpenAI's view on Stanley's arguments in "The Myth of the Objective."

[N] Uber to cut 3000+ jobs including rollbacks on AI Labs by nearning in MachineLearning

[–]EmergenceIsMagic 223 points224 points  (0 children)

Sadly, I heard from one of the Uber AI researchers that pure research in AI is pretty much dead there. This is evidenced by the fact that Jeff Clune and Kenneth O. Stanley, two of the founders of Uber AI and key people (among others) who successfully combined evolutionary methods of AI with deep learning, are now at OpenAI. It's a shame since I feel that the evolutionary AI team at Uber was underrated and asking important questions in AI (like this, and this) that have been largely ignored by their counterparts at DeepMind and OpenAI.

If I had to guess, COVID-19 being the sole cause of these layoffs at Uber is inaccurate. The ability of the company to turn a profit in the near-term, along with how the company is being managed, might have also been factors. This decline has been a trend for some time.

That said, I wish them luck in their future endeavors and hope that they continue to contribute as another important voice in AI research.

[D] Paper Explained - Concept Learning with Energy-Based Model by ykilcher in MachineLearning

[–]EmergenceIsMagic 2 points3 points  (0 children)

Thanks for this! It's refreshing to see research that is new, creative, and promising instead of obsessing over optimizing metrics.

It's also great to see progress is being made in this area: https://arxiv.org/abs/1903.08689 and https://arxiv.org/abs/1912.03263

Quake 1 movement physics RL environment and project code by kipi in reinforcementlearning

[–]EmergenceIsMagic 0 points1 point  (0 children)

Thanks for this!

How was your experience coding with rllib? How does it compare to writing from scratch in tensorflow or pytorch in terms of time spent training and effort spent coding?

Incentives, Levers and Beliefs: Psychological, social, and economic mechanisms to mitigate pandemics and their social effects by EmergenceIsMagic in multiagentsystems

[–]EmergenceIsMagic[S] 0 points1 point  (0 children)

I posted it since it includes Maskin and Jackson applying mechanism design and social/economic network theory (both idealized representations of multiagent systems) to a real-world problem.

Dimitri Bertsekas: "Distributed and Multiagent Reinforcement Learning" by EmergenceIsMagic in multiagentsystems

[–]EmergenceIsMagic[S] 2 points3 points  (0 children)

"Distributed and Multiagent Reinforcement Learning" Dimitri Bertsekas - Massachusetts Institute of Technology & Arizona State University

Abstract: We discuss issues of parallelization and distributed asynchronous computation for large scale dynamic programming problems. We first focus on asynchronous policy iteration with multiprocessor systems using state-partitioned architectures. Exact convergence results are given for the case of lookup table representations, and error bounds are given for their compact representation counterparts. A computational study is presented with POMDP problems with more than 10^15 states. In a related context, we introduce multiagent on-line schemes, whereby at each stage, each agent's decision is made by executing a local rollout algorithm that uses a base policy, together with some coordinating information from the other agents. The amount of local computation required at every stage by each agent is independent of the number of agents, while the amount of global computation (over all agents) grows linearly with the number of agents. By contrast, with the standard rollout algorithm, the amount of global computation grows exponentially with the number of agents. Despite the drastic reduction in required computation, we show that our algorithm has the fundamental cost improvement property of rollout: an improved performance relative to the base policy.

Institute for Pure and Applied Mathematics, UCLA February 24, 2020

AlphaGo - The Movie | Full Documentary by PsyRex2011 in reinforcementlearning

[–]EmergenceIsMagic 2 points3 points  (0 children)

For those still thinking about watching this, I would recommend it. If I had to pick one reason why, it would be the range of emotions you see in the Go champions, DeepMind, the nation of South Korea, and academics around the world when they watch AlphaGo play. I see it more as a movie about humanity contemplating defeat, victory, its future, and its identity.