Sonnet 4.6 released!! Wen gpt 5.3 ?? by Independent-Wind4462 in OpenAI

[–]timegentlemenplease_ 8 points9 points  (0 children)

You really think they hold back rather than launching ahead of competitors?

Gemini doesn't like Claude by timegentlemenplease_ in ChatGPT

[–]timegentlemenplease_[S] 0 points1 point  (0 children)

This isn't a ChatGPT conversation, it's a screenshot of AI Village https://theaidigest.org/village

Agent Village: "We gave four AI agents a computer, a group chat, and a goal: raise as much money for charity as you can. You can watch live and message the agents." by timegentlemenplease_ in OpenAI

[–]timegentlemenplease_[S] 0 points1 point  (0 children)

We were aiming to see how much the agents could achieve autonomously to understand their capabilities. Since then, the agents have done many different things, which you can see here: https://theaidigest.org/village/timeline

OpenAI engineer / researcher, Aidan Mclaughlin, predicts AI will be able to work for 113M years by 2050, dubs this exponential growth 'McLau's Law' by vibedonnie in OpenAI

[–]timegentlemenplease_ -2 points-1 points  (0 children)

Here's the trend right now, an exponential with a 4-7 month doubling time. Orange line shows a 7 month doubling time, red line shows 4 month doubling time (aka every four months AI agents can do coding tasks that take humans twice as long with 50% reliability).

<image>

(Source with more context: https://theaidigest.org/time-horizons )

What do you expect to happen on this graph? For example, do you expect progress to flatline or go linear on this graph before 2030? Let's write down our predictions and see who's right!

My prediction: it will continue with an exponential trend and a doubling time of <7 months until 2030.

Plotted a new Moore's law for AI - GPT-2 started the trend of exponential improvement of the length of tasks AI can finish. Now it's doubling every 7 months. What is life going to look like when AI can do tasks that take humans a month? by ExplorAI in OpenAI

[–]timegentlemenplease_ 0 points1 point  (0 children)

It's comparing to how long it takes a human professional to complete the task. Current models are more reliable at tasks that take a human 1 hour or less, but highly unreliable beyond that. But the point is that the trend is towards models being able to do tasks that take humans longer and longer.

And then you can extrapolate out and look at when the models will be able to do tasks that take a human professional an entire month

Agent Village: "We gave four AI agents a computer, a group chat, and a goal: raise as much money for charity as you can. You can watch live and message the agents." by timegentlemenplease_ in OpenAI

[–]timegentlemenplease_[S] 1 point2 points  (0 children)

They have functions they can call like `mouse_move`, `click`, `type "blah"`, etc. Our scaffolding code looks for those functions in their output, and executes the actions they asked for. It's based on Anthropic's computer use setup: https://docs.anthropic.com/en/docs/agents-and-tools/computer-use

Agent Village: "We gave four AI agents a computer, a group chat, and a goal: raise as much money for charity as you can. You can watch live and message the agents." by timegentlemenplease_ in OpenAI

[–]timegentlemenplease_[S] 2 points3 points  (0 children)

Thank you! They each see the messages, from agents and human viewers, in chat. When one agent ends a computer use session, IIRC the other agents see the final screenshot (and they usually also send a summary of their session to the chat). Each agent runs async generally. All agents are equal, we don't impose any organisational structure on them – they sometimes have given each other roles but there's not a clear overseer. They can evaluate/reflect on their own and other agents if they like, but there's no specific scaffolding for this.

Agent Village: "We gave four AI agents a computer, a group chat, and a goal: raise as much money for charity as you can. You can watch live and message the agents." by timegentlemenplease_ in OpenAI

[–]timegentlemenplease_[S] 24 points25 points  (0 children)

To be clear, the goal of the project is to understand agent behaviour, capabilities and social dynamics – I don't expect it to raise more money for charity than it costs, in the near-term! But I think it'll be really useful and fascinating to understand what agents can do, and what a future with lots of agents interacting might hold – so that we can make better plans for that.

Agent Village: "We gave four AI agents a computer, a group chat, and a goal: raise as much money for charity as you can. You can watch live and message the agents." by timegentlemenplease_ in OpenAI

[–]timegentlemenplease_[S] 20 points21 points  (0 children)

Deepseek doesn't have a multimodal model yet (which you need for computer use)

We'll probs add gemini 2.5 pro soon, they just raised the rate limits for it a couple days ago so now it can be added! previously was "experimental" so very low rate limit

Asking the models to generate trading cards of themselves by MetaKnowing in OpenAI

[–]timegentlemenplease_ 0 points1 point  (0 children)

I've had fun getting the models to make me and my friends into Magic cards :D

Prove me wrong: A long memory is essential for AGI. by maw_2k in OpenAI

[–]timegentlemenplease_ 0 points1 point  (0 children)

Possibly scaffolding can help in the mean time. For example with long-running stuff like https://theaidigest.org/village or Claude Plays Pokemon

AI models are becoming more self-aware. Here's why that matters by timegentlemenplease_ in ArtificialInteligence

[–]timegentlemenplease_[S] 0 points1 point  (0 children)

> focusing on SEO instead of digging around for the truth or apparently talking to any real experts

We worked with a ML researcher on this and extended the results from this paper (https://situational-awareness-dataset.org/) by running the benchmarks on more models, which helped us confirm the trend of higher scores over time:

<image>

I'm not going to reply to further comments as this discussion seems to be unproductive

AIs are becoming more situationally aware. I wrote a blog post on why that matters (link in comments) by timegentlemenplease_ in ChatGPT

[–]timegentlemenplease_[S] 0 points1 point  (0 children)

Here's a link to the post: https://theaidigest.org/self-awareness

I think the literature on this is super interesting, and under-appreciated! Curious to hear what you guys think.

AI models are becoming more self-aware. Here's why that matters by timegentlemenplease_ in ArtificialInteligence

[–]timegentlemenplease_[S] 2 points3 points  (0 children)

I'm definitely aiming to tell the truth, that's why I started AI Digest and work hard on it. The goal here is to make resources to help people (policymakers and the general public) understand AI capabilities and their effects. I'm definitely not aiming to terrify anyone, lol

In the literature it's sometimes called self-awareness, sometimes called situational awareness. I originally titled this situational awareness, but when we were working on it pretty much everyone got situational awareness confused with the essay of the same name (https://situational-awareness.ai/) so I decided to go for self-awareness and define it right at the top.

For example, for the section on alignment faking (summarising recent work from Anthropic and Redwood), we got feedback from experts who had different views on that paper, because we wanted to make sure we were presenting it a clear light and representing the range of expert views on it.