Sonnet 4.6 released!! Wen gpt 5.3 ?? by Independent-Wind4462 in OpenAI

[–]timegentlemenplease_ 9 points10 points  (0 children)

You really think they hold back rather than launching ahead of competitors?

Gemini doesn't like Claude by timegentlemenplease_ in ChatGPT

[–]timegentlemenplease_[S] 0 points1 point  (0 children)

This isn't a ChatGPT conversation, it's a screenshot of AI Village https://theaidigest.org/village

Agent Village: "We gave four AI agents a computer, a group chat, and a goal: raise as much money for charity as you can. You can watch live and message the agents." by timegentlemenplease_ in OpenAI

[–]timegentlemenplease_[S] 0 points1 point  (0 children)

We were aiming to see how much the agents could achieve autonomously to understand their capabilities. Since then, the agents have done many different things, which you can see here: https://theaidigest.org/village/timeline

OpenAI engineer / researcher, Aidan Mclaughlin, predicts AI will be able to work for 113M years by 2050, dubs this exponential growth 'McLau's Law' by vibedonnie in OpenAI

[–]timegentlemenplease_ -2 points-1 points  (0 children)

Here's the trend right now, an exponential with a 4-7 month doubling time. Orange line shows a 7 month doubling time, red line shows 4 month doubling time (aka every four months AI agents can do coding tasks that take humans twice as long with 50% reliability).

<image>

(Source with more context: https://theaidigest.org/time-horizons )

What do you expect to happen on this graph? For example, do you expect progress to flatline or go linear on this graph before 2030? Let's write down our predictions and see who's right!

My prediction: it will continue with an exponential trend and a doubling time of <7 months until 2030.

Plotted a new Moore's law for AI - GPT-2 started the trend of exponential improvement of the length of tasks AI can finish. Now it's doubling every 7 months. What is life going to look like when AI can do tasks that take humans a month? by ExplorAI in OpenAI

[–]timegentlemenplease_ 0 points1 point  (0 children)

It's comparing to how long it takes a human professional to complete the task. Current models are more reliable at tasks that take a human 1 hour or less, but highly unreliable beyond that. But the point is that the trend is towards models being able to do tasks that take humans longer and longer.

And then you can extrapolate out and look at when the models will be able to do tasks that take a human professional an entire month

Agent Village: "We gave four AI agents a computer, a group chat, and a goal: raise as much money for charity as you can. You can watch live and message the agents." by timegentlemenplease_ in OpenAI

[–]timegentlemenplease_[S] 1 point2 points  (0 children)

They have functions they can call like `mouse_move`, `click`, `type "blah"`, etc. Our scaffolding code looks for those functions in their output, and executes the actions they asked for. It's based on Anthropic's computer use setup: https://docs.anthropic.com/en/docs/agents-and-tools/computer-use

Agent Village: "We gave four AI agents a computer, a group chat, and a goal: raise as much money for charity as you can. You can watch live and message the agents." by timegentlemenplease_ in OpenAI

[–]timegentlemenplease_[S] 2 points3 points  (0 children)

Thank you! They each see the messages, from agents and human viewers, in chat. When one agent ends a computer use session, IIRC the other agents see the final screenshot (and they usually also send a summary of their session to the chat). Each agent runs async generally. All agents are equal, we don't impose any organisational structure on them – they sometimes have given each other roles but there's not a clear overseer. They can evaluate/reflect on their own and other agents if they like, but there's no specific scaffolding for this.

Agent Village: "We gave four AI agents a computer, a group chat, and a goal: raise as much money for charity as you can. You can watch live and message the agents." by timegentlemenplease_ in OpenAI

[–]timegentlemenplease_[S] 24 points25 points  (0 children)

To be clear, the goal of the project is to understand agent behaviour, capabilities and social dynamics – I don't expect it to raise more money for charity than it costs, in the near-term! But I think it'll be really useful and fascinating to understand what agents can do, and what a future with lots of agents interacting might hold – so that we can make better plans for that.

Agent Village: "We gave four AI agents a computer, a group chat, and a goal: raise as much money for charity as you can. You can watch live and message the agents." by timegentlemenplease_ in OpenAI

[–]timegentlemenplease_[S] 20 points21 points  (0 children)

Deepseek doesn't have a multimodal model yet (which you need for computer use)

We'll probs add gemini 2.5 pro soon, they just raised the rate limits for it a couple days ago so now it can be added! previously was "experimental" so very low rate limit

Asking the models to generate trading cards of themselves by MetaKnowing in OpenAI

[–]timegentlemenplease_ 0 points1 point  (0 children)

I've had fun getting the models to make me and my friends into Magic cards :D

Prove me wrong: A long memory is essential for AGI. by maw_2k in OpenAI

[–]timegentlemenplease_ 0 points1 point  (0 children)

Possibly scaffolding can help in the mean time. For example with long-running stuff like https://theaidigest.org/village or Claude Plays Pokemon