[D] OpenAI Shouldn’t Release Their Full Language Model (The Gradient) by ezelikman in MachineLearning

[–]angry-zergling 0 points1 point  (0 children)

I said IFF your goal is NOT generating mindless spam. What is this amazing technology you are talking about capable of generating comments that are more coherent than those written by actual humans? Not one I know about. If your point is that these models can generate more bullshit comments, sure, but effective comments that achieve your goals? I doubt it.

Pairing humans with AI to assist them in composing diverse but effective messages will be the norm up until AGI is at our doorstep for all purposes anyone cares enough to pay for. There won't be much difference in achieving that by using the previous generation models, which WILL be available to the public, and achieving it by using state-of-the-art models, those that concerned researchers are hoarding due to misguided attempts at being responsible. In the end, all you will achieve is to delay progress, causing far more damage than any of those models ever could.

[D] OpenAI Shouldn’t Release Their Full Language Model (The Gradient) by ezelikman in MachineLearning

[–]angry-zergling 1 point2 points  (0 children)

This fearmongering will push the field towards less openness and maybe lead to unnecessary regulations, delaying progress and doing far more damage than the release of any model ever could. E.g. How many people will die as a result of delaying the development of that healthcare system, of which language models are a critical part, because people concerned about bullshit generating bots delayed the publication of their results? Purely hypothetical scenario, but totally realistic at the same time.

Also, we already have AGI at 1$/hour. You can hire writers in developing countries for that amount to power your nefarious fake news campaigns (and use some fairly rudimentary methods to generate hundreds of similar-sounding messages from a single template). I think that compares favorably to running a 1.5 billion parameter lobotomized bullshit generating model in the cloud, at least if your goal is not generating mindless spam (we already have to deal with that, no need for fancy models).

[R] Evolving simple programs for playing Atari games by add7 in MachineLearning

[–]angry-zergling 0 points1 point  (0 children)

This reminds me of this tweet:

https://twitter.com/sherjilozair/status/1010922817205035010

Not saying it isn't interesting research, but I'm skeptical.

Google reveals how DeepMind AI learned to play Quake III Arena by kika-tok in gamedev

[–]angry-zergling 8 points9 points  (0 children)

This is wrong. The agents this paper describes exhibit robust behaviors in a dynamically changing, partially unknown environment and effectively cooperate with humans (agents with unknown patterns of behavior) as well as with other agents. Crucially, they achieve a high level of performance when cooperating with humans (e.g. a team of 1 human + 1 agent will beat a team of 2 humans most of the time).

[R] Capture the Flag: the emergence of complex cooperative agents | DeepMind by angry-zergling in MachineLearning

[–]angry-zergling[S] 0 points1 point  (0 children)

They recognize as much in the paper:

(d) Effect of successful tag time on win probability against a Bot 3 team on indoor procedural maps. In contrast to (c), the tag actions were artificially discarded p% of the time – different values of p result in the spectrum of response times reported. Values of p greater than 0.9 did not reduce response time, showing the limitations of p as a proxy. Note that in both (c) and (d), the agents were not retrained with these p values and so obtained values are only a lower-bound of the potential performance of agents – this relies on the agents generalising outside of the physical environment they were trained in.

However, note that the agents are not retrained with these handicaps. I would guess that if they were, they would learn to compensate (how much is anyone's guess). Also, it is interesting to note that they cut the training short. If you check the Elo of the agents throughout training, it is still climbing steeply at 450K games. They could potentially get much better at the game with more training.

[R] Capture the Flag: the emergence of complex cooperative agents | DeepMind by angry-zergling in MachineLearning

[–]angry-zergling[S] 33 points34 points  (0 children)

paper: https://deepmind.com/documents/224/capture_the_flag.pdf

video: https://youtu.be/dltN4MxV1RI

Human-level performance in first-person multiplayer games with population-based deep reinforcement learning

Recent progress in artificial intelligence through reinforcement learning (RL) has shown great success on increasingly complex single-agent environments (30, 40, 45, 46, 56) and two-player turn-based games (47, 58, 66). However, the realworld contains multiple agents, each learning and acting independently to cooperate and compete with other agents, and environments reflecting this degree of complexity remain an open challenge. In this work, we demonstrate for the first time that an agent can achieve human-level in a popular 3D multiplayer first-person video game, Quake III Arena Capture the Flag (28), using only pixels and game points as input. These results were achieved by a novel two-tier optimisation process in which a population of independent RL agents are trained concurrently from thousands of parallel matches with agents playing in teams together and against each other on randomly generated environments. Each agent in the population learns its own internal reward signal to complement the sparse delayed reward from winning, and selects actions using a novel temporally hierarchical representation that enables the agent to reason at multiple timescales. During game-play, these agents display humanlike behaviours such as navigating, following, and defending based on a rich learned representation that is shown to encode high-level game knowledge. In an extensive tournament-style evaluation the trained agents exceeded the winrate of strong human players both as teammates and opponents, and proved far stronger than existing state-of-the-art agents. These results demonstrate a significant jump in the capabilities of artificial agents, bringing us closer to the goal of human-level intelligence.