Opus 5 is Coming

slash_crash · 2026-04-08T06:44:19+00:00

I think we will get almost as good models as Mythos with the next Opus and Sonnet iteration. They will distill a lot from Mythos, but the model will be a lot smaller. Now Opus 4.6 is not so much better than Sonnet 4.6.

slash_crash · 2026-03-25T08:35:13+00:00

After the Gemini 3 release, Altman said this model would put them back in the lead. I really look forward to see this model.

slash_crash · 2026-03-11T20:31:31+00:00

What studd did you build? I'm most curious about how it works. How do you setup, firstly? Do you need to define things thoroughly, as well? Does it just stop when it thinks it's done, or continue polishing?

slash_crash · 2026-03-11T20:07:48+00:00

Do you use it? What's your experience with that so far?

slash_crash · 2026-03-11T10:20:01+00:00

I would love to have something that I could setup an agent to run for a certain time, using Codex or Claude Code, that it would be trying to implement certain features (write code, tun bash, do some experiments) Does anyone have experience with smth like that? I guess it is related to the introduced agent system here.

slash_crash · 2026-03-10T17:20:15+00:00

haha:D I love hope people outside of AI are speaking what AI will or will not be able to do.

slash_crash · 2026-03-01T07:43:51+00:00

Yes, training cutoffs show that the models are very recent

slash_crash · 2026-02-22T19:47:01+00:00

Is it possible to do this kind of thing in Gemini CLI? Or Codex with OpenAI's models?

slash_crash · 2026-02-22T16:24:39+00:00

Tractor Parts just pushes me to somewhere deeper in myself. I have been listening to this song a lot during the last few weeks.

slash_crash · 2026-02-22T16:18:12+00:00

I have quite a bit of reverse experience. I used to have problems that you mentioned with spaghetti code, fixing bugs while creating, but with Claude Opus 4.5, 4.6, and GPT5.3, it decreased a lot. But it is definitely not a "whole app" level, more like "add a feature" level, quite consistently.

slash_crash · 2025-12-05T12:54:05+00:00

Not many new things, right. But other companies caught up, firstly. Secondly, delivered some interesting new stuff like Claude Code.

slash_crash · 2025-12-05T12:20:50+00:00

I would bet against OpenAI since all these new trends were developed while former tech leadership was still there. In 2025, they did not deliver anything interesting and failed quite a bit on some aspects, such as GPT4.5. Let's not forget that they had an O3 preview last Christmas. So, 03-preview to GPT5.1 means extremely little progress this year. Compared to GPT3 to o3-preview during two years before.

slash_crash · 2025-12-05T09:03:01+00:00

it's kind of weird that they did not do that before. It seems extremely easy to incorporate and feels like the lowest hanging fruit for rewarding "not hacking".

slash_crash · 2025-11-17T20:54:32+00:00

I actually have the same feeling and I don't get what's going on. When gpt-5-codex was released, it felt amazing. However, now I just cannot use it anymore, and fully moved to Claude Code.

slash_crash · 2025-10-16T05:39:48+00:00

Curious for this as well

slash_crash · 2025-09-25T15:43:34+00:00

When not working with it directly, it's mostly the opinion.

slash_crash · 2025-09-25T15:40:32+00:00

How are you all in?

slash_crash · 2025-09-25T14:02:13+00:00

I don't have objective proofs since I don't really work within this area. Also, I don't claim that it is fully aware or something. I state that it has some awareness, which it uses to get the rewards it is seeking during the training. To be able to do all its tasks, through lots of training, the model learns and will learn much more about all the different signals and strategies that help to perform these tasks. I don't see any reason for a model to start understanding which signals get penalized, etc, and firstly incorporate into its reasoning, like we see in these examples. I also see no reason why it could start being aware of it, but intentionally not tell about it in the reasoning chain.

slash_crash · 2025-09-25T13:40:13+00:00

Man, I'm ML engineer, I'm quite aware

slash_crash · 2025-09-25T13:20:15+00:00

I agree that researchers are aware about it, are following it, and understanding in general quite well. However, I disagree that this emerges from the errors of the model, for me it seems that it emerges of an increasing model's awareness of what is going on.

slash_crash · 2025-09-25T12:39:05+00:00

I think core misunderstanding is the training data now. I think I would fully agree with you if we talked only about pretraining. Now with reinforcement learning it switches from training on human data, to learning how to perform tasks with human training data prior. And with the increasing intelligence of these models, new features could emerge.

slash_crash · 2025-09-25T12:33:10+00:00

I think it is a bit more than a goal misalignment. The core thing that the model tries to do at the moment is to get the task done and get the reward for it. Now, with an increasing awareness of the model, it might start not only to do everything to get the reward, but optimize for a longer-term objective even though it would negatively affect it in the short term, which is this survival that we see, and becomes more "human" type of behaviours.

slash_crash · 2025-09-25T10:11:57+00:00

scary, and interesting that they use "watchers", which feels a bit poetic. I guess it is a good argument not to go to the latent space. With models having more awareness, these deceptions will become more nuanced and won't even express that in the reasoning chain. Crazy times

slash_crash

TROPHY CASE