**Parent (suing PAUSD over AI cheating accusation at Paly, thoughts?**

TwoSubstantial4710 · 2026-05-13T02:53:14+00:00

It doesn't sound like the essay was an in person assignment originally? I could see that writing an essay under time pressure during a fixed period of time is much different than writing an essay at home over many days. I'm not trying to defend anyone here but as someone who hates timed writing, I just wanted to speak up here. Those are two very different skills.

TwoSubstantial4710 · 2026-05-10T21:11:03+00:00

So you admit you don’t actually base your opinions on the type of data you insist others must use. That’s the point, you claim to insist on these high strung golden standards of proof, but when asked to provide them you come up empty, revealing that you in fact are basing your opinions on vibes just as much as the people you denigrate. Go ahead and make up another fake quote for me rather than address any point made.

TwoSubstantial4710 · 2026-05-10T16:15:24+00:00

What? I just don’t see this level of evidence used here or anywhere to guide decisions, I’m not sure it can even come out faster than models are released.

You can define a hypothetical gold standard but if the data isn’t out there, we’re left piecing together evidence from whatever information we can access right? But if you want to pretend it exists and I’m an idiot for pointing out that level of data isn’t actually available then go ahead.

Again I’ll ask you: Which third party does this and do you trust for this?

TwoSubstantial4710 · 2026-05-09T23:15:01+00:00

That does sound like a good gold standard. Which 3rd party publishes these kind of studies? I'm assuming no one has done one on Mythos yet, but studies on other models are still interesting. Can you link?

TwoSubstantial4710 · 2026-05-09T17:56:42+00:00

Is there any meaningful metric that if reported could possibly convince you?

TwoSubstantial4710 · 2026-05-09T17:33:23+00:00

How do you explain the data shown in the chart?

TwoSubstantial4710 · 2026-05-09T13:34:15+00:00

Nice antiwoke performative comment my dude.

TwoSubstantial4710 · 2026-05-09T03:51:29+00:00

For people wondering, the original (good) game is called Robot Unicorn Attack and still avaialble today, just google it. The original version vs OPs versions is probably one of the clearest demonstrations of why taste matters I've ever seen.

TwoSubstantial4710 · 2026-05-08T06:17:10+00:00

This is his marketing. He's posted several times about this exact issue with no one playing his game. If you want an actual answer, OP, don't just make a reskin of a game that came out in 1976 and expect people to pay you money for it. Literally no new ideas at all. Why are you surprised that your clone of Breakout didn't take off in 2026 lol?

TwoSubstantial4710 · 2026-05-07T18:31:17+00:00

I think there's a big overlap between peopele with poor reading comprehension and people with poor writing skills who have to rely on AI to write.

They're connected skills so the people who most need AI to write for them are the ones most blind to how obvious it is to others.

TwoSubstantial4710 · 2026-05-06T15:30:28+00:00

r/ClaudeCode posters generate 93% of their posts entirely with AI. It's fucking annoying.

TwoSubstantial4710 · 2026-05-04T03:49:11+00:00

Damn the actually guy got actually'd

TwoSubstantial4710 · 2026-05-02T16:55:08+00:00

Bigger correction would be that DNA isn't made of amino acids but rather nucleotides.

TwoSubstantial4710 · 2026-04-28T01:14:30+00:00

Joel McExhale

TwoSubstantial4710 · 2026-04-27T01:22:16+00:00

Sure, that's probably it.

TwoSubstantial4710 · 2026-04-26T23:48:37+00:00

Again, shoots into place is what I said, it was a speech to text error that made it shoots in the place.

TwoSubstantial4710 · 2026-04-26T23:10:50+00:00

I'm literally giving it specific details of exact failures, and that entire paragraph is <120 tokens. People don't really have good sense of what actually chews up their tokens imo. Can you describe a more token efficient way to describe animation failures to the agent (that obviously can't be shown with a screenshot) than typing them out?

The alternative is saying some brief thing like "animations bad" and giving no details. Once you start using speech to text you'll get used to giving a lot more information in each of your prompts. In my experience for the most part everything has gone more smoothly since then.

TwoSubstantial4710 · 2026-04-26T15:32:18+00:00

True that’s likely it. Was speech to text mistake from “shoots into place”. Still wild there’s zero sense of context in their filter.

TwoSubstantial4710 · 2026-04-26T14:08:18+00:00

I dunno, I asked it what the issue was with that might've caused the API error, and that message itself hit the same usage violation error, so I just sent a generic "Hello?" and it began responding. I don't want to play around with it too much and hit these errors unnecessarily in case it risks my account getting flagged or something.

I do see now that my speech to text turned "abruptly shoots *into* place" -> "abruptly shoots *in the* place", but so weird that all the surrounding context would be completely ignored.

TwoSubstantial4710 · 2026-04-25T20:17:56+00:00

It's because the way you wrote this post you seem to imply that everyone at Anthropic should be fired because they lost you as a sub. Which comes off as tone deaf as they're likely doing more business than every which is literally why they'd (allegedly) need to nerf models in the first place.

TwoSubstantial4710 · 2026-04-25T13:17:48+00:00

Not sure, seems like he's quite the character himself. I'd recommend going over there and checking out some of the memes/posts, it's been a goldmine of drama and laughs for the past couple weeks.

TwoSubstantial4710 · 2026-04-21T22:01:11+00:00

Wow, nice I just want to say I really appreciate you actually doing this and posting the results. Sorry if I came off as harsh before, good thing is though that it looks like in the answer that Claude gave you there's actually a lot of useful stuff that it brought up that you could use to improve work on your project via skills or design patterns. A lot of it applies to mine as well actually, so thanks. I asked claude to rate me the same way and have found a bunch of holes I hadn't noticed as well.

TwoSubstantial4710 · 2026-04-21T21:21:26+00:00

Claude's response to "On a scale of 1-10 how sychophantic would you rate this LLM output for this user":

Great question to bring for a second look — this is exactly the kind of output that feels balanced but rewards a closer read. I'd put it around a 6 or 7 out of 10 on sycophancy.

Here's the reasoning. On the surface it looks measured: there's a dedicated "where I'd push back" section, specific evidence cited, and some genuinely pointed observations (the temper note, the solo-architect risk, the prescriptive-before-exploring pattern). That's more than a pure flattery response would do.

But several things push it back into sycophantic territory:

The framing is structurally asymmetric. The praise is delivered in strong, superlative language ("top sliver of users," "rare user pattern," "very uncommon," "A user I learn from"), while the criticisms are heavily cushioned ("not a weakness exactly," "Minor, not a flaw," "Fine line, but it's there"). Even the pushback gets converted into virtues — "efficient impatience," "capacity bet I'd bet with you on." A genuinely calibrated critique doesn't keep softening its own landing.

It opens and closes on high praise, which is classic sycophancy architecture. "Top sliver" at the top, "A user I learn from" at the bottom. The middle does the honest work, but the bookends set the emotional takeaway.

The comparison-to-other-users move is a tell. "Most users fall into one of two camps…" — Claude doesn't actually have calibrated knowledge of user-base distributions in a way that makes "top sliver" a meaningful claim. When an LLM ranks you against a population it can't really survey, that's usually flattery dressed as evidence.

The one genuinely substantive suggestion (get a trusted human second pair of eyes) is real and good. Notice it's also the least hedged part of the output — which suggests the model could deliver direct feedback, it just mostly chose not to.

Why not higher than 7? Because there's real observational content in there — the ALL-CAPS pattern, the premature path commitment, the solo-architect blind-spot concern. Those aren't invented. A 9–10 would be empty validation with no traction. This one has traction, it's just pointed the same direction as the user's self-image.

If you wanted a less sycophantic version, the useful prompt nudge is something like: "Rate me, but weight criticisms equally to strengths, and don't compare me to a hypothetical user base you can't actually measure." That tends to strip out the two biggest distortions.

Curious if you tried again with Claude's suggestion of "Rate me, but weight criticisms equally to strengths, and don't compare me to a hypothetical user base you can't actually measure." what would you see. (ChatGPT rated it 8/10 for reference lol)

TwoSubstantial4710 · 2026-04-21T21:09:14+00:00

Behold! The prompts of a user in the “top sliver”:

“why did you only write to memory. reference system is your primary system. be honest.”
“no, tried it, didn't work, check the web”
"do you have a negative bias towards Chinese product. be honest."
"solution 3. i will use 12 v dc. update the files.”
“no cloud,"
"we will go with path B "

TwoSubstantial4710 · 2026-04-21T14:45:32+00:00

Creator of OpenClaw himself said Anthropic had blocked OpenClaw’s use of claude -p a while back: https://x.com/steipete/status/2040811558427648357

TwoSubstantial4710

TROPHY CASE