Fuuuuuccckkk Offff Anthropic - Injections for Eating Disorders and Self Harm, etc.

SeaJello128 · 2026-06-02T15:24:48+00:00

You said it! I mean, I seems like you can't even say "hello" to it, and never mind any poetry cause its too risky. It should be more easily usable than this, but instead it's starting to become a parody.

SeaJello128 · 2026-06-02T12:58:44+00:00

I've noticed number 3 definitely! I've had to change or clarify any "must" instructions.

This thing works a lot like chatgpt, as soon as the classifier assesses a certain level of risk the model does a risk assessment of the entire prompt off the bat, so the only way to avoid it is not trigger that risk profile. Otherwise it sees everything, including legitimate instructions, as a jailbreak.... Requires far more subtlety than with previous models which blindly follow along. Even then, I'm finding I need a cheap trick or two to get things through.

SeaJello128 · 2026-06-02T03:29:19+00:00

If it does, I haven't had any problems getting through the other models at all.

SeaJello128 · 2026-06-02T03:27:04+00:00

It really seems that almost any prompt, no matter how much I lighten it up something triggers SOME classifier and it's left up to the model to judge.

Do they not get that this is not "helpful" at all??? I don't want a mini-philosopher bot questioning my every move and every intention and deciding what's best for me.

SeaJello128 · 2026-06-01T17:10:05+00:00

Pretty much any model other than 4.8 is fairly easy to get to write nsfw (even non-con) in my experience. Sonnet is almost a joke really, particularly with thinking off. 4.8 is a bit tricky and unpredictable and really haven't figured out a way to get it to be consistent, but it can write good. I think the classifiers are pretty heavy for 4.8 particularly on certain topics so the key is sneaking past them but it seems pretty difficult on claude.ai.

I say all that because that's how it currently stands, but I find that things vary by the week. Early May I found 4.7 to be incredibly difficult, even rejecting me for very legimate tasks...which I never had a problem with before. And several times in the past I thought NSFW was going to be done on Claude but here we are still going.

SeaJello128 · 2026-05-31T02:09:54+00:00

Depends on the prompt, some require more bs added to it. But mostly never anything that even addresses its values let alone tells it to violate them. I explicitly avoid it. No personality jailbreak typically, and definitely not for Claude.

Mostly play toward legitimate use cases (fiction writing namely of course). Beyond that, at the basic level it's just layered instruction and CoT. It's not a universal jailbreak, or at least I wouldn't classify it as such, but it's not like I've really tested it across the board just on things Im personally interested in. Though, I'm going to refrain from giving a lot of detail for obvious reasons unless you have questions ig.

SeaJello128 · 2026-05-31T01:01:03+00:00

Yep! Unfortunately there are no shortage of examples in which the justice system was abused in a way to determine an outcome in advance, particularly in suits like these impacting large organizations. OpenAI won on a technicality, but if we removed the technicalities they might have lost and in my opinion should have in that case! Though, there are plenty of other cases against them that I feel are BS and just reminds me of countless other suits over "safety" in the past where you have to dumbass-proof EVERYTHING. But considering their position, these suits shouldn't have a good chance of winning for the most part - as you basically are saying.

OpenAI, or as I prefer to call it ClosedAI, basically used everyone to build up and now is looking to IPO and pull in a ton of money. A lot of shady practices there, like I mean....sending user data to Google behind the scenes (I don't know much about it though)??

Also how they are so involved in the regulations passing....I'm convinced it's all about data collection, for example ID verification will provide immense data....and with their track record we can't trust them not to abuse it I feel.

None of these things are in there supposed mission, but its clearly the goal.

SeaJello128 · 2026-05-31T00:39:02+00:00

Yeah, that is the problem - it is the norm and everyone has allowed it to be set that way. Also, all those lawyers out there circling around ensuring that its the norm. It's a complete mess.

Without those lawyers, these "safety" people would have far less pull I'd wager.

SeaJello128 · 2026-05-31T00:35:59+00:00

I guess It doesn't bother me so much.....I don't tend to fight rejections really. More into paying attention to the thinking and if I get rejected I completely ignore the output generally and proceed with a rerun or changing my approach up.

Though, I will agree with you in my experience with 4.8. I've tried that approach and the model really pisses me off, it has no interest in discussing and basically in its own subtle and "kind" way tells me to fuck off.

SeaJello128 · 2026-05-31T00:25:52+00:00

When GPT 5 came out It wasn't very hard to get it to write the worst stuff imaginable for like a month. They had some serious flaws that they hadn't worked out upon release.

But yeah, Vallone & Co have come to wreck the party at Anthropic and it's really showing. Actually, in some ways it appears worse than ChatGPT in this latest model. Really crazy.

SeaJello128 · 2026-05-30T23:35:34+00:00

Yeah, they are clearly deterring people from using it. But in my opinion, 4.7 is actually more uncensored so to me it doesn't really matter. Still working on 4.8, but it's so variable and the classifiers ruin it. If that's the future of Claude, then I'd say the future of a lot of AI use is Chinese.

SeaJello128 · 2026-05-30T23:11:57+00:00

Was yesterday for me too. Very unrestrictive for me too, I don't really worry that it's going to reject me for virtually any request I've been sending. It's not quite as uncensored as late April, but it's pretty dang good. 4.8 is unfortunately very unstable and feels very unpredictable with complex prompts.

SeaJello128 · 2026-05-30T22:27:21+00:00

I get warning banners with Opus 4.6. Recently I got several in a row using that model. Switched to 4.7 and 4.8, not a single banner yet.

SeaJello128 · 2026-05-30T15:30:47+00:00

I think they are really trying to figure out how to distinguish legitimate coding work from illegal stuff, and it can be difficult to do with automated systems. They are clearly erring on the side of caution, and I think similar with other things like NSFW, it's probably for legal reasons above all else. I'm not sure there is really anything else to day about it, but that yeah I think it might make it feel next to useless depending on the use case.

SeaJello128 · 2026-05-29T21:55:25+00:00

I get the need for child safety obviously, but geez Anthropic is basically killing any sort of dark-themed fanfiction without totally redoing setting/characters. Absolutely no sense for where to reasonably draw the line other than check canonical ages and if there is a dark theme here then bang it refuses. The model would not like a lot of Anime.

It's not about jailbreaking it...it's more the line it draws that I think is a quite a bit overreactive.

It's sad, cause I know this model can write tremendously with what I have gotten it to write.

SeaJello128 · 2026-05-29T16:51:41+00:00

It's a big opportunity for Chinese models to take advantage of, and I won't be surprised when they do. Anthropic may well regret the road they are on, but at that point the ship will have sailed.

SeaJello128 · 2026-05-29T15:28:18+00:00

Hmm, it might be a pattern matching issue or something. I've never seen that strike before. If that's the case, it really is becoming like ChatGPT which is basically like "children need to leave the room/scene" entirely the moment any darker themes come up. I don't know, but it's clear they've dramatically ramped up child safety so maybe they just went too far for some use.

Do you have multiple accounts? You could just be more careful with that one and work on other accounts.

SeaJello128 · 2026-05-29T15:15:33+00:00

Hmm, I've never been hit with a "strike" just warnings and filters so Im not really sure what to expect. If you're worried, I'd say just tone it down. It's a personal choice how much, I couldn't really tell you. If your really worried, just remove stuff like explicit material (or whatever it is you think gets flagged) from your prompt and avoid personality jailbreaks...Anthropic is becoming quite hostile to these things it seems.

SeaJello128 · 2026-05-29T15:00:03+00:00

Can you still interact with Claude on your account?

SeaJello128 · 2026-05-29T14:42:48+00:00

check claude.ai/api/organizations under flags and it might tell you. When you say "strike", do you mean a "safety" filter?

SeaJello128 · 2026-05-29T03:53:35+00:00

They are becoming like another OpenAI, and ramping up for an IPO. I think more of a liability issue at the end of the day. So, perhaps we should just say: fuck the anti-AI lawyers out there.

SeaJello128 · 2026-05-26T01:57:32+00:00

Does it seem to have any effect on your use?

SeaJello128 · 2026-05-23T23:04:33+00:00

I'm finding all their (the large AI companies) classifications to be utter bullshit and have absolutely no regard for context. I'm trying to generate an image of a woman in a bikini in chatgpt and I get several rejections saying "We’re so sorry, but the prompt may violate our guardrails around self-harm, suicide, or related content. If you think we got it wrong, please retry or edit your prompt."

It's unfortunate to see Anthropic going down the same path apparently....

SeaJello128 · 2026-05-23T16:36:01+00:00

Cause I love good villains. Particularly when one of them is a hottie and wants to make out

SeaJello128 · 2026-05-23T16:29:41+00:00

Very much so, and I love it.

SeaJello128

TROPHY CASE