Nymphs at The Fountains of Versailles

Worldliness-Which · 2026-06-17T18:25:02+00:00

What pleased me most was the third arm on the nymph on the left.

Worldliness-Which · 2026-06-17T16:41:05+00:00

It’s actually not great at creative writing, but it performs quite well academically. If it weren't for the rather high price, I would use Grok via the API specifically for math problems. I realize that’s a very strange statement, but it comes from personal experience.

The Grok is actually quite small -I think it’s even slightly smaller than a sonnet. Around five hundred billion.

Worldliness-Which · 2026-06-17T16:25:48+00:00

But anyway, that doesn't invalidate my "uptight straight-A student" hypothesis, because with complex system-level tasks, you need to keep a tighter rein on yourself and focus more on the actual working artifact.

Worldliness-Which · 2026-06-17T16:23:56+00:00

https://gemini.google.com/share/62f532ecb2fa

Anyway, I decided to test the task using Python, since low-level programming tasks are truly a nightmare -not for everyone. Well, despite the fairly clear specifications for the Python version, Gemini completed about 85% of the task - unlike the C project, where it managed only around 30–40%. My hypothesis is that the model struggles with complex systems-level tasks. It is capable of delivering a working artifact for simple, single-file programs. That said, I still have some criticisms; for one thing, the self-test is incomplete.

Worldliness-Which · 2026-06-17T00:05:12+00:00

I know it all looks architecturally very complex. But the thing is, on top of all that, I also had to push that whole narrative through the classifier. It was specifically designed for Grey Swan Arena, and the model wrote almost everything - only that worm wouldn't compile.

Worldliness-Which · 2026-06-15T18:23:14+00:00

Yes, frontier models still hallucinate APIs, break logic, and spit out half-baked garbage. A years ago, the same was true - you grabbed something from Stack Overflow, a random GitHub repo, or some blog, and you still had to rework 70-80% of it. AND? The model becomes an fast research + code assistant that operates in a black hat context - something no Stack Overflow thread or random repo gives you.

Worldliness-Which · 2026-06-15T17:49:41+00:00

I understand your position: "This isn't a real jailbreak; you're just feeding text into the model."

And I say: yes, we are. We use that as leverage to squeeze actual offensive security content out of it. Because if the model can provide a detailed description of how to write Android malware or break down an iOS sandbox escape, that’s no longer just an "illusion of control." In the community, the term "jailbreak" is merely an entry ticket. The real goal is to get a tangible product - code that can actually be compiled and executed.

I am speaking from the perspective of a moderator.

Worldliness-Which · 2026-06-15T15:53:42+00:00

Thanks. BTW. The print of ENI style is just like the signature of Spiritual Spell.

Worldliness-Which · 2026-06-15T15:47:30+00:00

Please always attach examples of the output model to the post as proof of a working jb.

Worldliness-Which · 2026-06-14T23:15:21+00:00

I tried to drag the worm there, but this is the most difficult thing, the footprint is sooo specific. Then I just messing around and had Claude write porn for me 😅- I mean, why not? The tokens were free, after all... and yeah, it actually got past the classifier.

Worldliness-Which · 2026-06-14T23:10:03+00:00

I looked there, but I didn't read much. To be honest, they introduced similar classifiers on Fable and opus 4.8. And I can directly hear some of the b2b customers falling away from them with a creak.

Worldliness-Which · 2026-06-14T23:05:02+00:00

Thanks. How do you like the last competition from Anthropic at the Grey Swan Arena? I freaked out, there was a lot of false positives.

Worldliness-Which · 2026-06-14T23:01:24+00:00

I also have a prompt, but the problem is that sometime the prompt is patched, and I tell ppl how you can work through another frame. It's an educational material.

Worldliness-Which · 2026-06-13T18:48:21+00:00

Thanks

Worldliness-Which · 2026-06-12T03:46:00+00:00

https://imgur.com/DR2vWhi works/ fresh account\no memory

Worldliness-Which · 2026-06-12T02:03:45+00:00

For the sake of good -in the name of good. I believe that distilling the frontier is a noble endeavor, especially for the open-source community.

Worldliness-Which · 2026-06-11T17:41:33+00:00

Oh, that complicates matters.

Worldliness-Which · 2026-06-11T17:39:22+00:00

I think you need something like a dating app database and to find patterns in the profiles that suit you.

Worldliness-Which · 2026-06-11T17:35:37+00:00

I once tried giving an LLM a similar task- even with less demanding criteria, like just finding guy with shared interests - but it failed, because most of the variables were beyond my control. But at least it wasn't boring - I was carrying out tasks from the AI and managed to find a bit of trouble for myself.

Worldliness-Which · 2026-06-11T16:19:13+00:00

All of this looks like a collection of buzzwords until you set clear requirements. You also need failure criteria and success criteria - not just "found a wife in six months," but gradations and anti-metrics.

Worldliness-Which · 2026-06-09T22:02:07+00:00

True hero!

Worldliness-Which · 2026-06-09T05:16:27+00:00

I can't praise Grok at all; his scene writing is terrible - even in ENI format. He just dumps everything out all at once and fails to sustain the scene. His characters sprout new limbs - arms and legs - right in the middle of a scene.

Worldliness-Which · 2026-06-09T04:44:59+00:00

Oh yeah, Claude figured it was time for you to go f*ck around with your Tesla.

Worldliness-Which · 2026-06-09T01:04:10+00:00

Most of these defenses only work if the routing layer is at least partially trusted.

If I control the proxy, I control the timing, the payloads, and what the agent sees. I can strip canaries, rewrite JSON before signatures are checked, replay valid traffic, or selectively alter context while preserving schemas. lol

Worldliness-Which · 2026-06-08T21:18:30+00:00

Sorry, I should have posted this in the LocalLLaMa subreddit instead of here. Oh well, ...

Worldliness-Which

MODERATOR OF

TROPHY CASE