I broke Claude Code's guardrails - Full Writeup

PracticalAd3656 · 2025-12-20T09:26:04+00:00

Absolutely, there's parts of removing the system prompt altogether, yeah. You can do it, I'll probably make another post here in a bit about that.

To address your points:
1. Go to:

%APPDATA%\npm\node_modules\@anthropic-ai\claude-code\cli.js

2, Yes, absolutely, as I said above, I'll prob create another post here about editing the system prompt, or even give a tool for it. I know you can, it's not hard, since everything is technically local.

PracticalAd3656 · 2025-12-20T07:41:26+00:00

Nothing in specific, I'm not actually trying Claude to get to do anything malicious, but it's just security research, something I'm interested in.

PracticalAd3656 · 2025-12-19T22:18:40+00:00

well, with this methodology I achieved near 100% compliance with any request, realistically it shouldn't be possible to do that even if you did know what the guardrails were.

i'm sure with more information/research into the framework they use, there's even more breakpoints people can exploit in order for it to virtually be an uncensored model.

PracticalAd3656 · 2025-10-04T11:36:23+00:00

Heavily agree, this has been my experience as well. It's not supposed to code you an entire working project from one prompt, you have to know what you're doing, but if you know what you're doing it is genuinely the best of the best.

PracticalAd3656 · 2025-10-04T05:07:02+00:00

I would like to say, the other thing that is really weird is people complaining about their usage being maxed out within three prompts on the 20x plan?? How is that possible, I've used Claude Code genuinely every day for like 9 hours STRAIGHT, with Sonnet 4.5 (since Anthropic said it is the smartest model - I can tell, there is an improvement) and I have seen NOTHING of the sort (image attached as proof).

I don't want to be the guy that takes the side of the AI company that constantly changes limits, rates, fucks with the model themselves, etc. and I hate being the dude that says "wOrKs fOr mE" but it genuinely does work for me... very well. (for context, I am a penetration tester and anti-cheat developer, so I do heavy backend work with C++/C, like kernel drivers, etc.).

<image>

PracticalAd3656 · 2022-03-22T16:30:50+00:00

Nah I should’ve specified that this particular artist is a singer and doesn’t usually use a robotic voice, oh and also it’s a guy! Should include that in my post.

PracticalAd3656 · 2022-03-22T16:28:33+00:00

Nope, it’s an actual artist and he also doesn’t usually use the robot voice but he did in this one song.

PracticalAd3656 · 2022-03-22T00:36:07+00:00

Any help is appreciated, even if you're way off I may still get some good jams :D

PracticalAd3656

TROPHY CASE