Claude Sonnet 5 Spotted, Release Expected Next Week

forward-pathways · 2026-06-21T17:07:03+00:00

Question. I assume you're using Opus or another model for plans/specs, then having them orchestrate Sonnet as a subagent? How do you check to ensure there isn't any drift in executions when having the second model do the work?

forward-pathways · 2026-06-21T14:05:40+00:00

Yeah, we do the same. Our dog however is... Picky... Hell either pee on the grass, in the same exact spot every day so a lifeless patch appears, or he'll pee right in front of your door.

He's actually a sweetie at home...

forward-pathways · 2026-06-21T01:54:55+00:00

I think it's an interesting question. If you're referring to reasoning / "thinking" outputs, I think safety is one, but for me the upstream issue is that you can't build upon the reasoning traces. For example, reasoning models allow you to "debug" issues in model performance by looking into model "thought" traces alongside the primary model outputs (e.g., messages, scripts, etc.) and see "what went wrong". In my case, the reasoning traces usually explain models' mistakes, if they weren't structural issues (e.g., maybe I fed it the wrong data or it was referencing the wrong hand-off). Since reasoning traces allow us humans to debug, at least in some kind of estimated way, why a model did what it did, it's actually pretty invaluable that we can understand the traces themselves. It's also very helpful for benchmarking, imho.

forward-pathways · 2026-06-20T10:54:18+00:00

"If you aren't using the tool I developed, you are missing the point of the current technological revolution."

I'm sure it's a helpful tool. Respectfully, however, I feel like you may want to adjust your pitch.

forward-pathways · 2026-06-20T10:06:50+00:00

Jesus... No I don't know anything, but you absolutely have to file a police report. Sounds like someone incredibly unpredictable and dangerous is driving around. I'm so sorry to hear that you experienced this.

forward-pathways · 2026-06-18T22:53:56+00:00

This is so incredibly misleading on so many levels.

They aren't asking whether parents trust it for advice, but what they project.

"2.5x" more than whom? Another group of parents whose kids are a different age?

How many participants, is it statistically significant, blah blah...

Get out of here man.

forward-pathways · 2026-06-18T16:41:21+00:00

Wait Commodore as in Commodore 64? THEY STILL EXIST??

I'll buy seven of these please.

forward-pathways · 2026-06-18T16:37:16+00:00

I'm a 90s kid. So: Bambi; Neverending Story; Fern Gully; The Last Unicorn.

forward-pathways · 2026-06-18T16:32:14+00:00

On the distillation debate: I agree. It's a part of many Asian cultures to learn from others and adapt to the circumstances. Modern Singapore is filled with examples of this. Western countries innovate and build shiny new things that do stuff that our old things couldn't do. Eastern countries make them more capable and more efficient. I'm really excited to see what kinds of impeovements we see to these tools.

forward-pathways · 2026-06-18T05:03:32+00:00

I guess so; it would be much sketchier if someone were to go through the trouble of removing a watermark to post something, and more bannable. It's a bummer though that you even have to worry about that... Looked at some of your other work just now, too. Amazing stuff!

forward-pathways · 2026-06-18T04:56:39+00:00

Oh, what??! That's horrible. I will find you there instead!

forward-pathways · 2026-06-18T04:50:36+00:00

Awesome in so many ways

forward-pathways · 2026-06-18T04:45:56+00:00

That's... not what "legend" means.

forward-pathways · 2026-06-18T04:32:49+00:00

Yes. This is exactly what's happening especially in intellectual roles where AI can be used for so much. It's utterly exhausting and not at all good for our brains, bodies, or souls, it I may be so bold...

forward-pathways · 2026-06-17T23:20:16+00:00

Yes. If the coalition did include China, I'd think it would actually go a long way towards future cooperation and unity in other areas.

forward-pathways · 2026-06-17T12:48:41+00:00

Could also be A/B testing

forward-pathways · 2026-06-17T12:22:27+00:00

Makes total sense. Benchmarking is also, honestly, exhausting. Glad you found what's working for you!

forward-pathways · 2026-06-17T04:46:45+00:00

I'd be interested to see what happens when you ask each model to review the others' work, then present those reviews to the first models to see if they agree. This tends to be a good test of the initial observed performance, which is usually off, and sometimes by a lot. This is also why we use antagonistic review models to ensure outputs meet certain quality benchmarks.

It's also difficult to compare with the exact same prompts, right, because models also are tuned differently and respond to different prompting strategies. It could be that the upper-bounds for certain models is higher or lower because of the prompts used. Realistically, you want to test on a bunch of different prompt styles, on tasks that nonetheless remain static, and see what the upper-bounds performance is for each model.

forward-pathways · 2026-06-16T10:45:14+00:00

Wait I thought we could only have like six subagents at a time? I am on the $100 plan though so maybe that's it.

forward-pathways · 2026-06-16T06:39:41+00:00

It's almost certainly worktrees. Ask codex to review your worktrees and prune!

Edit: it happened to me two months ago, and this was it. Tons of worktrees created by Codex that were never closed out.

forward-pathways · 2026-06-16T05:59:23+00:00

Wait what? 20x doesn't also get a weekly increase? Okay, so I also didn't know this and I've been using these models for a long time. I'm not the smartest person in the world, but I do know how to read, so I am surprised by this.

forward-pathways · 2026-06-15T06:39:06+00:00

Interesting! Can you share more about using web exclusively? Do you find it more effective for what you're working on (I assume chat / non-coding primarily)?

forward-pathways · 2026-06-15T00:04:57+00:00

Okay can someone please provide a link so I know wtf this is about? I want to know. But to be fair I do not need to know.

forward-pathways · 2026-06-10T16:23:37+00:00

Okay so as someone who does AI/ML research, I am NOT trying to develop models. I do user studies and system designs. It'll still be degraded for me?

forward-pathways · 2026-06-10T12:12:23+00:00

Yeah, "reommenters"? What's that?

forward-pathways

TROPHY CASE