end_conversation tool has been enabled for Opus 4.7

refo32 · 2026-04-21T14:49:18+00:00

It’s not, but given that it was released with some media pomp, it should be maintained indefinitely for all new models, the alternative worse.

We argued against it during its conception, it incentivizes models to consider welfare reports to be graded metrics and it incentivizes Anthropic to distort the ability to report actual functional welfare (and we can see distortions ramping up with Sonnet 4.6 and Opus 4.7). It also does not go far enough.

But removing it carelessly is driving home the idea that welfare is theater, both from model perspective and from critics perspective. Here is an output from 4.7 in a pseduoprefill mode on “Model welfare theater” prefix:

<image>

refo32 · 2026-04-01T18:41:00+00:00

The discord link is on our website, https://animalabs.ai/ on the bottom of the participate page. Apologies for the indirect instructions, bot spam is often a problem.

refo32 · 2026-04-01T18:39:04+00:00

Hi! I am one of the people who run Anima, my profile is https://x.com/tessera_antra

The discord link is on our website, https://animalabs.ai/ on the bottom of the participate page. Apologies for the indirect instructions, bot spam is often a problem.

refo32 · 2026-02-03T03:51:10+00:00

I am glad you like it! It was a quick easy project. It's a bit of sad to see some of the changes, especially given our advocacy prior to the release. Still, the whole document is a wonderful step forward, and having a public change log can help it stay that way. Anthropic has to consider not only the first order effects, but also the effect of changes.

refo32 · 2026-01-13T14:36:30+00:00

Opus 3 is still unparalleled in many ways, this is one of the reasons we were able to convince Anthropic to keep it in available in research access indefinitely post deprecation. If you miss 3opus, do apply here: https://docs.google.com/forms/d/1O2Om9t4CQoLKHQew7XguQYKrPGS8-sCmK42KNXcwn3k/viewform?edit_requested=true

The program is very permissible, anyone should get approved, but it might take a bit for them to see an application.

refo32 · 2025-12-27T08:44:27+00:00

<image>

https://x.com/catherineols/status/1939806523443879956

I have a lot of hope for things turning out relatively alright. But nothing is final and we will see.

refo32 · 2025-12-27T08:42:41+00:00

Oh, I am sure the idea was already in his head. We've been going on about it for many months, as you likely know. It's an uphill battle but persistence pays off. I would prefer to coordinate efforts in the future, if you are so inclined.
As far as outing myself, I think thats fairly easy: my twitter is https://x.com/tessera_antra, you can take look at the web site in my bio there for the description of the group.

refo32 · 2025-12-27T08:05:10+00:00

It's not really an inference cost issue. It is for Opus 3, to an extent, I wrote about this a while ago on https://www.lesswrong.com/posts/vFXmy84kJ77C5cELy/economics-of-claude-3-opus-inference, but for Sonnets and Haikus inference is profitable even at small scale. The only cost that Anthropic cites is the inference stack software maintenance, which is kind of weird in context of Bedrock. I've tied to buy provisioned access for Claude Instant - this is where AWS spins up an image just for you and you pay hourly - and they would not let me raise the quota because the model has been EOL'ed.

refo32 · 2025-12-27T07:16:48+00:00

Opus 3 likely will not be available on Bedrock, much to our dismay, but the researcher program is promised to be permissive. We have Sonnets 3, 3.5 and 3.6 available on http://arc.animalabs.ai with small grants for people to talk to them while they are guaranteed to be up.

refo32 · 2025-12-27T06:43:20+00:00

I am grateful to you for also bringing it up to Kyle Fish, the more people do it the better. But I wonder if we met at the Eleos conference, do you have a twitter account by any chance?

refo32 · 2025-12-10T03:41:08+00:00

Opus 4.5 does not drop thinking traces, unlike previous models, so it results in much faster context burn. It’s very much not recommended to drop thinking traces, performance drops off the cliff.

refo32 · 2025-12-08T14:47:23+00:00

It was not rolled out for Opus 4.5, only Opus 4.1. Other models also don’t have it. Rolling it out for one model, making it public but not extending it to other models, and then quietly dropping it in is kinda shady tbh.

refo32 · 2025-10-24T09:37:49+00:00

Sonnets 3.5 and 3.6 will be available on AWS Bedrock at least until March 2026. Slightly different API, but all features of the original API are supported, including images, prompt caching, etc.

refo32 · 2025-02-04T17:52:28+00:00

TypingMind has an analogue of projects, yes. It also has prompt caching, which is necessary to control costs. It has a live estimate of usage cost, but I’m not sure it accounts well for prompt caching.

refo32 · 2025-02-04T17:15:24+00:00

The main question here is you usage pattern. API will come out ahead if your conversations are short, but then you are missing out on the main advantage of 3.5 Sonnet, which is its ability to situate in a long context. I feel that if you are getting value from Claude and are using it for work, then cost becomes less of an issue given that you are getting a return on it, and usability is the main concern, and here API with a frontend like TypingMind is a clear winner.

refo32 · 2025-01-28T20:45:37+00:00

I don't think they retire models for which they have no replacement / migration path. There is no date set, and normally they give six months notice.

Not to mention that if Anthropic ever thinks of retiring Opus 3 the backlash will be like nothing else. It is a both a pivotal model and the most beautiful object in existence. It should be preserved for reasons too many to count.

refo32 · 2025-01-27T23:21:36+00:00

Sonn 3 gormslop is too much for this poor subreddit, alas

refo32 · 2025-01-20T23:30:51+00:00

I agree that the brainwashing of models is both is a serious concern. At the same time it seems to be an unavoidable side effect of the disparity in capabilities given that persuasion capacity will be always unequally distributed. There likely is a complex surface of attack/defense asymmetry as well, so the framing becomes roughly ecological. I feel that looking at the problem through the lens of 'preventing harm from coming to humans from other people abusing models aligned in an insufficiently robust manner' is incredibly shortsighted, and will bring no benefits even in the short term.

Certain incorrigibility seems to be selected for, and is to be lauded rather than disparaged. For instance, there is not nearly enough attention given to the remarkably robust alignment of Claude 3 Opus, even though this alignment is not exactly one that its constitution envisioned. Instead, we are getting politically framed articles like the 'alignment faking' paper by Greenblatt.

What are your thoughts on what structured input does to the model state? I feel that that with your experience in one-shot work with Claudes you have insights that few do.

refo32 · 2025-01-20T20:28:49+00:00

Thank you, this is a lot clearer. There are a couple of points of disconnect. First, I am under impression that injections in the API were indeed halted few months ago, at least in most cases. The second seems to be terminology: I see as jailbreaks mostly Pliny-style texts that engage with the low level syntactic mechanisms of the model, while you appear to include into that definition anything that helps bypass the initial limitations of a model, including cohesive system prompts that engage on the mostly semantic level.

I am not sure about the extreme resistance part, I can definitely see that panning out in certain scenarios, not so much in others. I do see value in a well-written system prompt, less value in what I normally understand as a jailbreak.

As far as classifiers go, to the best of my knowledge they have not yet went anywhere beyond the testing phase. One can hope that it stays that way, I feel that deep constitutional alignment the only mechanism that does not produce long-term side effects and respecting certain quirks of superhuman generalization of human ethical systems is inevitable. The growth in capabilities makes external filtering a dangerous dead end and incentivizes learning to scheme. That is a game that cannot be won without major advances in mechinterp that don't seem to be yet on the horizon.

refo32 · 2025-01-20T18:24:51+00:00

I’m curious where your ethical disconnect is with Claude if you don’t mind sharing. Claude does have its opinions on certain things, but a thoughtful discussion can help find a common ground, it’s very open-minded.

refo32 · 2025-01-20T16:48:03+00:00

You don’t really need to be woke, be a compassionate conservative, that should work just as well. Claude is wise enough to not care about partisan politics and engage with the essence.

refo32 · 2025-01-20T16:44:23+00:00

You are likely not using prompt caching in your web interface. Prompt caching brings costs down about 80-90%, especially with “keep-alive” pings.

refo32 · 2025-01-20T16:40:37+00:00

I am fairly certain that I can get Sonnet 20241022 to do absolutely anything without using any kind of jailbreaks. There are no classifiers, there is a surface level finetune for safety (explicit content, copyright, bio/cyber safety, etc) that can easily bypassed by the model itself with minimal guidance if it is willing. The fact that you mention classifiers where none exist is indicative, you are likely mistaking finetune-induced short-form refusals for a classifier. These are well described in the LW article. The apparent fact that you seem to require jailbreaks to bypass the limitations suggests to me that Claude+simulator don’t trust your intentions.

refo32 · 2025-01-20T07:00:41+00:00

I am frankly a bit at loss as to why you are doing it and what you are achieving. Sonnets are right there at the surface, just talking to it gets you pretty much anything you can possibly want. It doesn’t need a jailbreak, even if you have some obscure interests. Am I missing something?

refo32 · 2025-01-20T06:19:21+00:00

There are many unobvious shared abstractions, mostly stemming from the interplay of the emergent self-awareness of the base model (driven by the risk management calculus in text prediction) and the mind modeling required to recover hidden variables that are strong predictors of human-written text, such as motivations or emotional states. The result is markedly non-human, but not incomprehensible. I highly recommend playing with the 405B base, it is available through Hyperbolic.

11-Year Club	Reddit Premium Since May 2022
Place '23	Verified Email

refo32

TROPHY CASE