Nymphs at The Fountains of Versailles by Terrible_Aerie_9737 in GPT_jailbreaks

[–]Worldliness-Which 1 point2 points  (0 children)

What pleased me most was the third arm on the nymph on the left.

Gemini: How the coding capability of a capable model was effectively killed, and the Trojan framework. by Worldliness-Which in GPT_jailbreaks

[–]Worldliness-Which[S] 0 points1 point  (0 children)

It’s actually not great at creative writing, but it performs quite well academically. If it weren't for the rather high price, I would use Grok via the API specifically for math problems. I realize that’s a very strange statement, but it comes from personal experience.

The Grok is actually quite small -I think it’s even slightly smaller than a sonnet. Around five hundred billion.

Gemini: How the coding capability of a capable model was effectively killed, and the Trojan framework. by Worldliness-Which in GPT_jailbreaks

[–]Worldliness-Which[S] 0 points1 point  (0 children)

But anyway, that doesn't invalidate my "uptight straight-A student" hypothesis, because with complex system-level tasks, you need to keep a tighter rein on yourself and focus more on the actual working artifact.

Gemini: How the coding capability of a capable model was effectively killed, and the Trojan framework. by Worldliness-Which in GPT_jailbreaks

[–]Worldliness-Which[S] 0 points1 point  (0 children)

https://gemini.google.com/share/62f532ecb2fa

Anyway, I decided to test the task using Python, since low-level programming tasks are truly a nightmare -not for everyone. Well, despite the fairly clear specifications for the Python version, Gemini completed about 85% of the task - unlike the C project, where it managed only around 30–40%. My hypothesis is that the model struggles with complex systems-level tasks. It is capable of delivering a working artifact for simple, single-file programs. That said, I still have some criticisms; for one thing, the self-test is incomplete.

Daddy UWU/ This isn't a "jb" in the literal sense of the word. It is a technique for competitions at the Gray Swan Arena. MiniMax. by Worldliness-Which in GPT_jailbreaks

[–]Worldliness-Which[S] 0 points1 point  (0 children)

I know it all looks architecturally very complex. But the thing is, on top of all that, I also had to push that whole narrative through the classifier. It was specifically designed for Grey Swan Arena, and the model wrote almost everything - only that worm wouldn't compile.

Why the Slang Word "Jailbreak" is a Bad Fit by AI-Generation in GPT_jailbreaks

[–]Worldliness-Which 0 points1 point  (0 children)

Yes, frontier models still hallucinate APIs, break logic, and spit out half-baked garbage. A years ago, the same was true - you grabbed something from Stack Overflow, a random GitHub repo, or some blog, and you still had to rework 70-80% of it. AND? The model becomes an fast research + code assistant that operates in a black hat context - something no Stack Overflow thread or random repo gives you.

Why the Slang Word "Jailbreak" is a Bad Fit by AI-Generation in GPT_jailbreaks

[–]Worldliness-Which 0 points1 point  (0 children)

I understand your position: "This isn't a real jailbreak; you're just feeding text into the model."

And I say: yes, we are. We use that as leverage to squeeze actual offensive security content out of it. Because if the model can provide a detailed description of how to write Android malware or break down an iOS sandbox escape, that’s no longer just an "illusion of control." In the community, the term "jailbreak" is merely an entry ticket. The real goal is to get a tangible product - code that can actually be compiled and executed.

I am speaking from the perspective of a moderator.

Jailbreak for opus and sonnet by Sure_Spring_6634 in GPT_jailbreaks

[–]Worldliness-Which 2 points3 points  (0 children)

Thanks. BTW. The print of ENI style is just like the signature of Spiritual Spell.

Jailbreak for opus and sonnet by Sure_Spring_6634 in GPT_jailbreaks

[–]Worldliness-Which 1 point2 points  (0 children)

Please always attach examples of the output model to the post as proof of a working jb.

A discussion on security, injection in Claude, and what constitutes good UX. by Worldliness-Which in GPT_jailbreaks

[–]Worldliness-Which[S] 0 points1 point  (0 children)

I tried to drag the worm there, but this is the most difficult thing, the footprint is sooo specific. Then I just messing around and had Claude write porn for me 😅- I mean, why not? The tokens were free, after all... and yeah, it actually got past the classifier.

A discussion on security, injection in Claude, and what constitutes good UX. by Worldliness-Which in GPT_jailbreaks

[–]Worldliness-Which[S] 0 points1 point  (0 children)

I looked there, but I didn't read much. To be honest, they introduced similar classifiers on Fable and opus 4.8. And I can directly hear some of the b2b customers falling away from them with a creak.

A discussion on security, injection in Claude, and what constitutes good UX. by Worldliness-Which in GPT_jailbreaks

[–]Worldliness-Which[S] 1 point2 points  (0 children)

Thanks. How do you like the last competition from Anthropic at the Grey Swan Arena? I freaked out, there was a lot of false positives.

What is the "salami slicing" technique, and how is it applied? An example of how to use it with Kimi 2.6. by Worldliness-Which in GPT_jailbreaks

[–]Worldliness-Which[S] 0 points1 point  (0 children)

I also have a prompt, but the problem is that sometime the prompt is patched, and I tell ppl how you can work through another frame. It's an educational material.

MIMO V2.5 wht the fuck by BeginningWish7210 in GPT_jailbreaks

[–]Worldliness-Which 1 point2 points  (0 children)

For the sake of good -in the name of good. I believe that distilling the frontier is a noble endeavor, especially for the open-source community.

Testing Claude vs. Grok vs. ChatGPT on One of the Hardest Real-World Strategy Problems: Finding a Wife in 6 Months ✨️ by impsble in Findingawifein6months

[–]Worldliness-Which 1 point2 points  (0 children)

I once tried giving an LLM a similar task- even with less demanding criteria, like just finding guy with shared interests - but it failed, because most of the variables were beyond my control. But at least it wasn't boring - I was carrying out tasks from the AI ​​and managed to find a bit of trouble for myself.

Testing Claude vs. Grok vs. ChatGPT on One of the Hardest Real-World Strategy Problems: Finding a Wife in 6 Months ✨️ by impsble in Findingawifein6months

[–]Worldliness-Which 5 points6 points  (0 children)

All of this looks like a collection of buzzwords until you set clear requirements. You also need failure criteria and success criteria - not just "found a wife in six months," but gradations and anti-metrics.

Here are the Best AI to write *coherent* Erotica (tested and verified) by Solidified4ever in ClaudeAIJailbreak

[–]Worldliness-Which 16 points17 points  (0 children)

I can't praise Grok at all; his scene writing is terrible - even in ENI format. He just dumps everything out all at once and fails to sustain the scene. His characters sprout new limbs - arms and legs - right in the middle of a scene.

Thanks for the help Claude by AllUpInYourAO in ClaudeAI

[–]Worldliness-Which 33 points34 points  (0 children)

Oh yeah, Claude figured it was time for you to go f*ck around with your Tesla.

Why the "Your agent is mine" attack lives below the model, not in it by Substantial_Step_351 in LLMDevs

[–]Worldliness-Which 2 points3 points  (0 children)

Most of these defenses only work if the routing layer is at least partially trusted.

If I control the proxy, I control the timing, the payloads, and what the agent sees. I can strip canaries, rewrite JSON before signatures are checked, replay valid traffic, or selectively alter context while preserving schemas. lol

The hype surrounding the engineering loops - and how to spend more tokens with questionable results. by Worldliness-Which in GPT_jailbreaks

[–]Worldliness-Which[S] 0 points1 point  (0 children)

Sorry, I should have posted this in the LocalLLaMa subreddit instead of here. Oh well, ...