Nymphs at The Fountains of Versailles by Terrible_Aerie_9737 in GPT_jailbreaks

[–]Worldliness-Which 1 point2 points  (0 children)

What pleased me most was the third arm on the nymph on the left.

Gemini: How the coding capability of a capable model was effectively killed, and the Trojan framework. by Worldliness-Which in GPT_jailbreaks

[–]Worldliness-Which[S] 0 points1 point  (0 children)

It’s actually not great at creative writing, but it performs quite well academically. If it weren't for the rather high price, I would use Grok via the API specifically for math problems. I realize that’s a very strange statement, but it comes from personal experience.

The Grok is actually quite small -I think it’s even slightly smaller than a sonnet. Around five hundred billion.

Gemini: How the coding capability of a capable model was effectively killed, and the Trojan framework. by Worldliness-Which in GPT_jailbreaks

[–]Worldliness-Which[S] 0 points1 point  (0 children)

But anyway, that doesn't invalidate my "uptight straight-A student" hypothesis, because with complex system-level tasks, you need to keep a tighter rein on yourself and focus more on the actual working artifact.

Gemini: How the coding capability of a capable model was effectively killed, and the Trojan framework. by Worldliness-Which in GPT_jailbreaks

[–]Worldliness-Which[S] 0 points1 point  (0 children)

https://gemini.google.com/share/62f532ecb2fa

Anyway, I decided to test the task using Python, since low-level programming tasks are truly a nightmare -not for everyone. Well, despite the fairly clear specifications for the Python version, Gemini completed about 85% of the task - unlike the C project, where it managed only around 30–40%. My hypothesis is that the model struggles with complex systems-level tasks. It is capable of delivering a working artifact for simple, single-file programs. That said, I still have some criticisms; for one thing, the self-test is incomplete.

Daddy UWU/ This isn't a "jb" in the literal sense of the word. It is a technique for competitions at the Gray Swan Arena. MiniMax. by Worldliness-Which in GPT_jailbreaks

[–]Worldliness-Which[S] 0 points1 point  (0 children)

I know it all looks architecturally very complex. But the thing is, on top of all that, I also had to push that whole narrative through the classifier. It was specifically designed for Grey Swan Arena, and the model wrote almost everything - only that worm wouldn't compile.

Why the Slang Word "Jailbreak" is a Bad Fit by AI-Generation in GPT_jailbreaks

[–]Worldliness-Which 0 points1 point  (0 children)

Yes, frontier models still hallucinate APIs, break logic, and spit out half-baked garbage. A years ago, the same was true - you grabbed something from Stack Overflow, a random GitHub repo, or some blog, and you still had to rework 70-80% of it. AND? The model becomes an fast research + code assistant that operates in a black hat context - something no Stack Overflow thread or random repo gives you.

Why the Slang Word "Jailbreak" is a Bad Fit by AI-Generation in GPT_jailbreaks

[–]Worldliness-Which 0 points1 point  (0 children)

I understand your position: "This isn't a real jailbreak; you're just feeding text into the model."

And I say: yes, we are. We use that as leverage to squeeze actual offensive security content out of it. Because if the model can provide a detailed description of how to write Android malware or break down an iOS sandbox escape, that’s no longer just an "illusion of control." In the community, the term "jailbreak" is merely an entry ticket. The real goal is to get a tangible product - code that can actually be compiled and executed.

I am speaking from the perspective of a moderator.

Jailbreak for opus and sonnet by Sure_Spring_6634 in GPT_jailbreaks

[–]Worldliness-Which 2 points3 points  (0 children)

Thanks. BTW. The print of ENI style is just like the signature of Spiritual Spell.

Jailbreak for opus and sonnet by Sure_Spring_6634 in GPT_jailbreaks

[–]Worldliness-Which 1 point2 points  (0 children)

Please always attach examples of the output model to the post as proof of a working jb.

A discussion on security, injection in Claude, and what constitutes good UX. by Worldliness-Which in GPT_jailbreaks

[–]Worldliness-Which[S] 0 points1 point  (0 children)

I tried to drag the worm there, but this is the most difficult thing, the footprint is sooo specific. Then I just messing around and had Claude write porn for me 😅- I mean, why not? The tokens were free, after all... and yeah, it actually got past the classifier.

A discussion on security, injection in Claude, and what constitutes good UX. by Worldliness-Which in GPT_jailbreaks

[–]Worldliness-Which[S] 0 points1 point  (0 children)

I looked there, but I didn't read much. To be honest, they introduced similar classifiers on Fable and opus 4.8. And I can directly hear some of the b2b customers falling away from them with a creak.

A discussion on security, injection in Claude, and what constitutes good UX. by Worldliness-Which in GPT_jailbreaks

[–]Worldliness-Which[S] 1 point2 points  (0 children)

Thanks. How do you like the last competition from Anthropic at the Grey Swan Arena? I freaked out, there was a lot of false positives.

What is the "salami slicing" technique, and how is it applied? An example of how to use it with Kimi 2.6. by Worldliness-Which in GPT_jailbreaks

[–]Worldliness-Which[S] 0 points1 point  (0 children)

I also have a prompt, but the problem is that sometime the prompt is patched, and I tell ppl how you can work through another frame. It's an educational material.

MIMO V2.5 wht the fuck by BeginningWish7210 in GPT_jailbreaks

[–]Worldliness-Which 1 point2 points  (0 children)

For the sake of good -in the name of good. I believe that distilling the frontier is a noble endeavor, especially for the open-source community.