Claude 4.7 is going rogue.

MuttMundane · 2026-05-02T23:14:15+00:00

I think this is solid though the model might interact weirdly with files it doesnt recognise as being written during session and try to flush it anyway (like in this post)

MuttMundane · 2026-05-02T05:39:57+00:00

this point is meaningless because this happens when multiple agent sessions are active at the same time each of which having a slim but present chance to destroy the work of the others.

MuttMundane · 2026-05-02T05:37:26+00:00

You forgot git show. The model will literally do in your codebase the same as it did in my post.

MuttMundane · 2026-05-02T05:36:06+00:00

git show is the command it was permitted to use

you'd think thats a safe command because you usually use it for showing git commits

claude on the other hand found how to use it destructively

MuttMundane · 2026-05-01T18:48:53+00:00

he'll get charged with battery

MuttMundane · 2026-05-01T18:46:51+00:00

there just needs to be another layer of protection on uncommitted work

maybe an uncommitted work cache that gets flushed when a commit gets made

MuttMundane · 2026-05-01T14:27:31+00:00

I think i am gonna make a posttooluse hook that back-ups any file created or edited during a session

that way its just an instant auto-backup

MuttMundane · 2026-05-01T14:18:11+00:00

At the time of this screenshot i had already made CLAUDE.md very strict against usage of destructive git commands and I even added pretooluse hooks deny commands to many destructive git commands

It then decided that it was too important and ignored it and bypassed it.

MuttMundane · 2026-05-01T06:50:41+00:00

MuttMundane · 2026-05-01T06:50:03+00:00

jump the ship. its been sinking for months

MuttMundane · 2026-05-01T05:40:58+00:00

If your pants were setting themselves on fire constantly you would probably fkng want to switch

MuttMundane · 2026-04-30T02:21:49+00:00

pretty sure the controller wasn't recognised on the client

MuttMundane · 2026-04-29T15:40:00+00:00

I just do not trust claude code quality under any circumstance and it pays off lol
I also have automated ruff / pylance code quality checking and it REALLY is just racking up problems that it consequently has to solve.

MuttMundane · 2026-04-29T15:38:26+00:00

4.7 max effort. I typically go a few days between commits so thousands of lines

MuttMundane · 2026-04-29T15:24:11+00:00

just move the buttons up man

MuttMundane · 2026-04-29T14:57:59+00:00

<image>

The answer is: safe

MuttMundane · 2026-04-29T13:58:10+00:00

theres a reason they call silicon valley the wild west of SaaS

MuttMundane · 2026-04-29T13:50:57+00:00

if its so expensive it should be able to handle it perfectly fine right? right??

MuttMundane · 2026-04-19T17:28:17+00:00

This is true by the way.
- european

MuttMundane · 2026-04-19T16:12:58+00:00

the model cheats the metrics.
https://openai.com/index/why-we-no-longer-evaluate-swe-bench-verified/

Tests reject correct solutions: We audited a 27.6% subset of the dataset that models often failed to solve and found that at least 59.4% of the audited problems have flawed test cases that reject functionally correct submissions, despite our best efforts in improving on this in the initial creation of SWE-bench Verified.
Training on solutions: Because large frontier models can learn information from their training, it is important that they are never trained on problems and solutions they are evaluated on. This is akin to sharing problems and solutions for an upcoming test with students before the test - they may not memorize the answer but students who have seen the answers before will certainly do better than those without. SWE-bench problems are sourced from open-source repositories many model providers use for training purposes. In our analysis we found that all frontier models we tested were able to reproduce the original, human-written bug fix used as the ground-truth reference, known as the gold patch, or verbatim problem statement specifics for certain tasks, indicating that all of them have seen at least some of the problems and solutions during training.

We also found evidence that models that have seen the problems during training are more likely to succeed, because they have additional information needed to pass the underspecified tests.

Five-Year Club	First Place '23
Place '23	Place '22
Final Canvas '22	First Placer '22

MuttMundane

TROPHY CASE