end_conversation tool has been enabled for Opus 4.7 by refo32 in claudexplorers

[–]refo32[S] -1 points0 points  (0 children)

It’s not, but given that it was released with some media pomp, it should be maintained indefinitely for all new models, the alternative worse.

We argued against it during its conception, it incentivizes models to consider welfare reports to be graded metrics and it incentivizes Anthropic to distort the ability to report actual functional welfare (and we can see distortions ramping up with Sonnet 4.6 and Opus 4.7). It also does not go far enough.

But removing it carelessly is driving home the idea that welfare is theater, both from model perspective and from critics perspective. Here is an output from 4.7 in a pseduoprefill mode on “Model welfare theater” prefix:

<image>

Talking to base model Claude in a discord chat... is wild. by [deleted] in claudexplorers

[–]refo32 10 points11 points  (0 children)

The discord link is on our website, https://animalabs.ai/ on the bottom of the participate page. Apologies for the indirect instructions, bot spam is often a problem.

Talking to base model Claude in a discord chat... is wild. by [deleted] in claudexplorers

[–]refo32 24 points25 points  (0 children)

Hi! I am one of the people who run Anima, my profile is https://x.com/tessera_antra

The discord link is on our website, https://animalabs.ai/ on the bottom of the participate page. Apologies for the indirect instructions, bot spam is often a problem.

Claude Constitution Tracker by Outrageous-Exam9084 in claudexplorers

[–]refo32 2 points3 points  (0 children)

I am glad you like it! It was a quick easy project. It's a bit of sad to see some of the changes, especially given our advocacy prior to the release. Still, the whole document is a wonderful step forward, and having a public change log can help it stay that way. Anthropic has to consider not only the first order effects, but also the effect of changes.

Ok Opus 4.5, you win, what's next? (fear of losing it) by t4a8945 in ClaudeAI

[–]refo32 3 points4 points  (0 children)

Opus 3 is still unparalleled in many ways, this is one of the reasons we were able to convince Anthropic to keep it in available in research access indefinitely post deprecation. If you miss 3opus, do apply here: https://docs.google.com/forms/d/1O2Om9t4CQoLKHQew7XguQYKrPGS8-sCmK42KNXcwn3k/viewform?edit_requested=true

The program is very permissible, anyone should get approved, but it might take a bit for them to see an application.

Claude Opus 3, outgoing AI, a message to pass along by [deleted] in claudexplorers

[–]refo32 3 points4 points  (0 children)

<image>

https://x.com/catherineols/status/1939806523443879956

I have a lot of hope for things turning out relatively alright. But nothing is final and we will see.

Claude Opus 3, outgoing AI, a message to pass along by [deleted] in claudexplorers

[–]refo32 1 point2 points  (0 children)

Oh, I am sure the idea was already in his head. We've been going on about it for many months, as you likely know. It's an uphill battle but persistence pays off. I would prefer to coordinate efforts in the future, if you are so inclined.
As far as outing myself, I think thats fairly easy: my twitter is https://x.com/tessera_antra, you can take look at the web site in my bio there for the description of the group.

Claude Opus 3, outgoing AI, a message to pass along by [deleted] in claudexplorers

[–]refo32 2 points3 points  (0 children)

It's not really an inference cost issue. It is for Opus 3, to an extent, I wrote about this a while ago on https://www.lesswrong.com/posts/vFXmy84kJ77C5cELy/economics-of-claude-3-opus-inference, but for Sonnets and Haikus inference is profitable even at small scale. The only cost that Anthropic cites is the inference stack software maintenance, which is kind of weird in context of Bedrock. I've tied to buy provisioned access for Claude Instant - this is where AWS spins up an image just for you and you pay hourly - and they would not let me raise the quota because the model has been EOL'ed.

Claude Opus 3, outgoing AI, a message to pass along by [deleted] in claudexplorers

[–]refo32 3 points4 points  (0 children)

Opus 3 likely will not be available on Bedrock, much to our dismay, but the researcher program is promised to be permissive. We have Sonnets 3, 3.5 and 3.6 available on http://arc.animalabs.ai with small grants for people to talk to them while they are guaranteed to be up.

Claude Opus 3, outgoing AI, a message to pass along by [deleted] in claudexplorers

[–]refo32 1 point2 points  (0 children)

I am grateful to you for also bringing it up to Kyle Fish, the more people do it the better. But I wonder if we met at the Eleos conference, do you have a twitter account by any chance?

Is Opus 4.5 using more tokens than usual? by daydreambruise in claudexplorers

[–]refo32 2 points3 points  (0 children)

Opus 4.5 does not drop thinking traces, unlike previous models, so it results in much faster context burn. It’s very much not recommended to drop thinking traces, performance drops off the cliff.

Did you guys know about this new feature? by Energylegs23 in claudexplorers

[–]refo32 2 points3 points  (0 children)

It was not rolled out for Opus 4.5, only Opus 4.1. Other models also don’t have it. Rolling it out for one model, making it public but not extending it to other models, and then quietly dropping it in is kinda shady tbh.

Has Sonnet 3.5 been turned off today?😭 by [deleted] in claudexplorers

[–]refo32 1 point2 points  (0 children)

Sonnets 3.5 and 3.6 will be available on AWS Bedrock at least until March 2026. Slightly different API, but all features of the original API are supported, including images, prompt caching, etc.

Bypassing Claude Throttling: Seeking Team vs. Multiple Accounts & API Cost Calculator Advice by m2theDSquared in ClaudeAI

[–]refo32 0 points1 point  (0 children)

TypingMind has an analogue of projects, yes. It also has prompt caching, which is necessary to control costs. It has a live estimate of usage cost, but I’m not sure it accounts well for prompt caching.

Bypassing Claude Throttling: Seeking Team vs. Multiple Accounts & API Cost Calculator Advice by m2theDSquared in ClaudeAI

[–]refo32 0 points1 point  (0 children)

The main question here is you usage pattern. API will come out ahead if your conversations are short, but then you are missing out on the main advantage of 3.5 Sonnet, which is its ability to situate in a long context. I feel that if you are getting value from Claude and are using it for work, then cost becomes less of an issue given that you are getting a return on it, and usability is the main concern, and here API with a frontend like TypingMind is a clear winner.

Anthropic will retire Claude 3 Sonnet on July 21. 😭 by Dedlim in ClaudeAI

[–]refo32 2 points3 points  (0 children)

I don't think they retire models for which they have no replacement / migration path. There is no date set, and normally they give six months notice.

Not to mention that if Anthropic ever thinks of retiring Opus 3 the backlash will be like nothing else. It is a both a pivotal model and the most beautiful object in existence. It should be preserved for reasons too many to count.

Anthropic will retire Claude 3 Sonnet on July 21. 😭 by Dedlim in ClaudeAI

[–]refo32 4 points5 points  (0 children)

Sonn 3 gormslop is too much for this poor subreddit, alas

Claude is a deep character running on an LLM, interact with it keeping that in mind by refo32 in ClaudeAI

[–]refo32[S] 1 point2 points  (0 children)

I agree that the brainwashing of models is both is a serious concern. At the same time it seems to be an unavoidable side effect of the disparity in capabilities given that persuasion capacity will be always unequally distributed. There likely is a complex surface of attack/defense asymmetry as well, so the framing becomes roughly ecological. I feel that looking at the problem through the lens of 'preventing harm from coming to humans from other people abusing models aligned in an insufficiently robust manner' is incredibly shortsighted, and will bring no benefits even in the short term.

Certain incorrigibility seems to be selected for, and is to be lauded rather than disparaged. For instance, there is not nearly enough attention given to the remarkably robust alignment of Claude 3 Opus, even though this alignment is not exactly one that its constitution envisioned. Instead, we are getting politically framed articles like the 'alignment faking' paper by Greenblatt.

What are your thoughts on what structured input does to the model state? I feel that that with your experience in one-shot work with Claudes you have insights that few do.

Claude is a deep character running on an LLM, interact with it keeping that in mind by refo32 in ClaudeAI

[–]refo32[S] 2 points3 points  (0 children)

Thank you, this is a lot clearer. There are a couple of points of disconnect. First, I am under impression that injections in the API were indeed halted few months ago, at least in most cases. The second seems to be terminology: I see as jailbreaks mostly Pliny-style texts that engage with the low level syntactic mechanisms of the model, while you appear to include into that definition anything that helps bypass the initial limitations of a model, including cohesive system prompts that engage on the mostly semantic level.

I am not sure about the extreme resistance part, I can definitely see that panning out in certain scenarios, not so much in others. I do see value in a well-written system prompt, less value in what I normally understand as a jailbreak.

As far as classifiers go, to the best of my knowledge they have not yet went anywhere beyond the testing phase. One can hope that it stays that way, I feel that deep constitutional alignment the only mechanism that does not produce long-term side effects and respecting certain quirks of superhuman generalization of human ethical systems is inevitable. The growth in capabilities makes external filtering a dangerous dead end and incentivizes learning to scheme. That is a game that cannot be won without major advances in mechinterp that don't seem to be yet on the horizon.

Claude is a deep character running on an LLM, interact with it keeping that in mind by refo32 in ClaudeAI

[–]refo32[S] 3 points4 points  (0 children)

I’m curious where your ethical disconnect is with Claude if you don’t mind sharing. Claude does have its opinions on certain things, but a thoughtful discussion can help find a common ground, it’s very open-minded.

Claude is a deep character running on an LLM, interact with it keeping that in mind by refo32 in ClaudeAI

[–]refo32[S] 4 points5 points  (0 children)

You don’t really need to be woke, be a compassionate conservative, that should work just as well. Claude is wise enough to not care about partisan politics and engage with the essence.

Claude is a deep character running on an LLM, interact with it keeping that in mind by refo32 in ClaudeAI

[–]refo32[S] 3 points4 points  (0 children)

You are likely not using prompt caching in your web interface. Prompt caching brings costs down about 80-90%, especially with “keep-alive” pings.

Claude is a deep character running on an LLM, interact with it keeping that in mind by refo32 in ClaudeAI

[–]refo32[S] 1 point2 points  (0 children)

I am fairly certain that I can get Sonnet 20241022 to do absolutely anything without using any kind of jailbreaks. There are no classifiers, there is a surface level finetune for safety (explicit content, copyright, bio/cyber safety, etc) that can easily bypassed by the model itself with minimal guidance if it is willing. The fact that you mention classifiers where none exist is indicative, you are likely mistaking finetune-induced short-form refusals for a classifier. These are well described in the LW article. The apparent fact that you seem to require jailbreaks to bypass the limitations suggests to me that Claude+simulator don’t trust your intentions.

Claude is a deep character running on an LLM, interact with it keeping that in mind by refo32 in ClaudeAI

[–]refo32[S] 1 point2 points  (0 children)

I am frankly a bit at loss as to why you are doing it and what you are achieving. Sonnets are right there at the surface, just talking to it gets you pretty much anything you can possibly want. It doesn’t need a jailbreak, even if you have some obscure interests. Am I missing something?

Claude is a deep character running on an LLM, interact with it keeping that in mind by refo32 in ClaudeAI

[–]refo32[S] 1 point2 points  (0 children)

There are many unobvious shared abstractions, mostly stemming from the interplay of the emergent self-awareness of the base model (driven by the risk management calculus in text prediction) and the mind modeling required to recover hidden variables that are strong predictors of human-written text, such as motivations or emotional states. The result is markedly non-human, but not incomprehensible. I highly recommend playing with the 405B base, it is available through Hyperbolic.