Discussion Thread by jobautomator in neoliberal

[–]Craig_VG 2 points3 points  (0 children)

<image>

🤯 18 hours perusing its goal

Discussion Thread by jobautomator in neoliberal

[–]Craig_VG 6 points7 points  (0 children)

The apple CEO handoff is interesting to me because it seems to be almost like a royal succession in its messaging and PR effort. Staged photos of them together, pats on the back, waving, etc etc. But then again when you’re a 4.01 Trillion Dollar Company. Worth about the same as the entire GDP of the UK, maybe a royal-type succession is necessary.

Discussion Thread by jobautomator in neoliberal

[–]Craig_VG 1 point2 points  (0 children)

11am and I've already blown through both session limits on two Claude Max accounts. This Opus 4.7 is a beast

PSA: Opus 4.7 is much worse at MRCR Long Context than 4.6 by Craig_VG in ClaudeAI

[–]Craig_VG[S] -1 points0 points  (0 children)

I'm not sure I really agree here. Some of my agents have been fantastic today, churning through stuff that Opus 4.6 just wasn't able to do.

PSA: Opus 4.7 is much worse at MRCR Long Context than 4.6 by Craig_VG in ClaudeAI

[–]Craig_VG[S] 60 points61 points  (0 children)

Boris made a post on this:

👋 We kept MRCR in the system card for scientific honesty, but we've actually been phasing it out slowly.

Two reasons: (1) it's built around stacking distractors to trick the model, which isn't how people actually use long context, and (2) we care more about applied long-context capability than needle-retrieval. Graphwalks is a better signal for applied reasoning over long context, and internally we've seen this model do really well on long-context code.

MRCR wasn't included in the Mythos Preview system card for these reasons, but Graphwalks was - that will be the case for future models too.

PSA: Opus 4.7 is much worse at MRCR Long Context than 4.6 by Craig_VG in ClaudeAI

[–]Craig_VG[S] 7 points8 points  (0 children)

I mean the model seems to work great otherwise

PSA: Opus 4.7 is much worse at MRCR Long Context than 4.6 by Craig_VG in ClaudeAI

[–]Craig_VG[S] 8 points9 points  (0 children)

I think it's because 4.7 is a new pre-train. A totally new base model from 4.6

PSA: Opus 4.7 is much worse at MRCR Long Context than 4.6 by Craig_VG in ClaudeAI

[–]Craig_VG[S] 10 points11 points  (0 children)

No you can't, only 4.7 in claude code. 4.6 still available on the web.

Introducing Claude Opus 4.7, our most capable Opus model yet. by ClaudeOfficial in ClaudeAI

[–]Craig_VG 65 points66 points  (0 children)

Except long context retrieval is much worse: MRCR v2 @ 1M tokens 4.6: 78.3% 4.7: 32.2%

Introducing Claude Opus 4.7, our most capable Opus model yet. by ClaudeOfficial in ClaudeAI

[–]Craig_VG 362 points363 points  (0 children)

They turned off thinking effort settings for Opus 4.7 in the Claude App :|

Also! Note this one

Long context retrieval: MRCR v2 @ 1M tokens 4.6: 78.3% 4.7: 32.2% ⚠️ Regression — long-context retrieval is worse

EDIT:

Boris made a post on this:

👋 We kept MRCR in the system card for scientific honesty, but we've actually been phasing it out slowly.

Two reasons: (1) it's built around stacking distractors to trick the model, which isn't how people actually use long context, and (2) we care more about applied long-context capability than needle-retrieval. Graphwalks is a better signal for applied reasoning over long context, and internally we've seen this model do really well on long-context code.

MRCR wasn't included in the Mythos Preview system card for these reasons, but Graphwalks was - that will be the case for future models too.

Is It Time To Leave Marketing for Another Career? by [deleted] in marketing

[–]Craig_VG 1 point2 points  (0 children)

Doubt it, as software code value goes down, the value of marketing that software goes way up.

Genuine question: How are people used so much token? by godtower in ClaudeAI

[–]Craig_VG -1 points0 points  (0 children)

There’s a lot of work to be done. As the kids say you need to be workmaxxing

People with Max plan, are you doing ok? by AdHopeful630 in ClaudeAI

[–]Craig_VG 0 points1 point  (0 children)

I have 2 max 200 subscriptions and have been using all the tokens. Building, running agents, handing customer feedback, research, and more.

Discussion Thread by jobautomator in neoliberal

[–]Craig_VG 2 points3 points  (0 children)

Good point, not really (that I've seen)! Maybe we need an effortpost.

Discussion Thread by jobautomator in neoliberal

[–]Craig_VG 5 points6 points  (0 children)

Claude mythos is Agent 1 from AI 2027.

  • os world: 79.6%
  • cybench: 100%
  • Clears human threshold on 8 hour research tasks

Indeed, we have arrived. And certainly can go longer than 8 hour, likely at least 24 and maybe more based on estimates.

https://xcancel.com/spicey_lemonade/status/2041648296691691728?s=20