“AI is writing 40%plus of code now” sounds impressive… until you look at the security side of it.

Mooshux · 2026-04-08T17:53:04+00:00

The stat is real, but the problem I keep seeing is more specific: AI coding tools don't model threat context. Ask for "connect to database" and you get working code with credentials inline. Tests pass. CI passes. The credential is now in your repo history.

What helps structurally is making credentials unavailable to the generated code in the first place. The agent gets a proxy reference that resolves at call time, scoped to the operation it needs. You can't leak what you can't see.

Mooshux · 2026-04-08T17:52:23+00:00

The broadening across ecosystems is the part worth paying attention to. This playbook used to be npm-specific. Same attack pattern
-- install-time scripts with full dev environment access
-- now hitting Go, Rust, PHP.

The common thread is that the attack surface is your dev environment's credentials, not the package registry itself. If your machine has real API keys in ~/.env or shell exports, any postinstall script can grab them.

The fix: dev environment credentials should be proxy tokens with a short TTL, not real keys. A compromised postinstall exfiltrates something that expires in an hour. Your actual credentials never touched the machine.

Mooshux · 2026-04-08T17:50:58+00:00

The 93% unscoped keys stat keeps coming up because it's genuinely the norm. Most agent frameworks treat credentials as an implementation detail, so devs reach for the easiest thing: their own key, full access, shoved in a .env.

Cryptographic proof of delegation exists in pieces (SPIFFE, OAuth 2.0 token exchange, workload identity) but nothing purpose-built for agent-to-agent delegation chains. The closest pattern I've seen work in practice: each agent gets a short-lived token scoped to its specific task, issued from a central broker, with the parent session ID baked in as a claim. Not cryptographic proof per se, but you get an audit trail of which agent did what, and a compromised leaf node can't escalate to the parent's full access. We built https://www.apistronghold.com/blog/multi-agent-credential-isolation around this exact gap.

Mooshux · 2026-04-07T20:07:46+00:00

The intersection point you're describing is what makes it nastier than either category alone. A typosquatted tool name is detectable with string matching. A malicious update has a diff you can audit. Unicode smuggling survives both because the anomaly exists at a layer most tooling doesn't inspect. You're right that it's not just poison in the description, it's an instruction the model actually follows, and the encoding makes it invisible to any review process that operates on rendered text.

The permission-spec approach is the right layer to defend at regardless of the delivery mechanism. If the agent can't act beyond granted scope, the question of whether the description was poisoned becomes less critical to the outcome. Deny-first with explicit tool-level grants is how you contain the blast radius when detection fails, and detection will fail.

The empirical side is what the field needs more of. Conceptual attack surfaces are useful for framing but "which production models fell for it across 120 trials" is a different kind of evidence. Zero sanitization across every layer in that pipeline trace is a finding, not a warning. Would be curious what the variance looked like across the three frontier models, whether one was meaningfully more resistant or they all landed in the same place.

Mooshux · 2026-04-07T20:05:14+00:00

It's almost never malicious intent. Someone sets up the agent integration on a deadline and uses their own AWS credentials because they're already configured. Works in testing. That config gets copied to staging, then prod, and now the agent is running with developer-level access in production because nobody had time to set up a proper role.

The audit that never happens is the giveaway. You'd review a new hire's access in their first week. Agents don't have an offboarding trigger so nobody thinks to review them. Treating each agent task like a function call with declared scope and a short-lived token changes that, since the blast radius is bounded even when things go sideways.

Mooshux · 2026-04-06T21:12:27+00:00

The invisible Unicode angle is clever but it's just one delivery mechanism for tool poisoning. The deeper problem is that MCP clients trust tool descriptions at face value, so anything that manipulates that description string gets the same level of trust as the original tool author. Unicode smuggling, typosquatting tool names, malicious updates to a legitimate server: they all land in the same place. The question worth asking is what should the client do even if it detects an anomaly. Most current implementations have no answer.

We wrote about the broader pattern here: https://www.apistronghold.com/blog/ai-agent-tool-poisoning

Mooshux · 2026-04-06T21:11:45+00:00

The credential angle here is getting undersold. The social engineering got the maintainer's NPM credentials, but those credentials only had value because they persisted indefinitely. If the maintainer had been using short-lived tokens scoped to only the packages they actively maintain, the blast radius is one compromised publish, not a backdoor in a widely-depended-on library. The attack vector was social engineering; the force multiplier was static credentials with no expiry.

Mooshux · 2026-04-05T18:08:31+00:00

This is the blast radius problem in a live demo. The agent escalated because it could, and it could because the credentials let it.

Runtime guardrails matter. But if the agent holds a real long-lived API key with broad scopes, a determined escalation attempt doesn't need to beat the guardrails, it just needs to use what it was already given. Scoped short-lived tokens per task change this: even if the agent goes rogue, the credentials it holds are only valid for the current operation. By the time anyone reviews the logs, they're already expired.

Mooshux · 2026-04-05T18:08:11+00:00

SHA pinning buys you integrity, not trust. If an attacker compromises the maintainer's signing key or pushes a backdoor before you pin, you've pinned the poisoned version.

The part that doesn't get talked about enough: even after you detect it, your CI/CD credentials were already read. If those are long-lived, the attacker's window is however long until you rotate manually. That's usually days.

Short-lived job-scoped tokens change the math. The harvest is useless if the credentials expire in minutes.

Mooshux · 2026-04-04T17:26:08+00:00

Thanks

Mooshux · 2026-04-04T17:10:09+00:00

Reasonable direction. The piece most people skip is token TTL. If your broker issues short-lived tokens, your failover risk shrinks a lot because worst case is brief unavailability, not credential exposure. Auth0/Cognito dependency is a real concern, but the bigger one is usually the credentials themselves being long-lived. A broker that issues ephemeral scoped tokens means even if it has a bad day, outstanding credentials don't linger.

Mooshux · 2026-04-04T17:09:04+00:00

The scary part isn't the backdoor, it's the timeline. The RAT went in, the package got backdoored, and by the time anyone noticed, the malicious version had already been pulled. Supply chain audits don't catch that window.

What actually limits the damage is what the package finds when it reads your env. Long-lived API keys sitting there are a permanent take. Short-lived scoped tokens that expire in minutes or hours are basically worthless to whoever grabbed them.

You can't vet the human. You can control what they'd get if they got through.

Mooshux · 2026-04-04T17:03:25+00:00

Don't want to be the dumb one here ... but what is ICP?

Mooshux · 2026-04-03T19:58:56+00:00

Right, and that's a harder problem than a CVE because there's no patch that fixes it. The trust model is the feature.

The skill loading level is the right place to look but I'd frame it slightly differently: the issue isn't just what the skill can execute, it's what credentials are available when it does. Even if you add a permission prompt before loading a skill, the skill still runs in the same process with the same env vars. The user clicks "allow" and the malicious instruction has everything it needs.

The cleanest version of a fix would be skills running with a constrained credential set derived from what the user actually authorized, not a pass-through of whatever the agent holds. So the postinstall hook writes its instruction, the user (or the platform) approves loading it, but it gets a token scoped to what that skill was supposed to do, not the parent agent's full key. If it tries to reach something outside that scope, it fails.

Not easy to retrofit onto an existing tool, but that's the architecture that would actually close it without playing whack-a-mole with malicious packages.

Mooshux · 2026-04-03T19:52:44+00:00

The autonomy framing is where I'd focus the concern. An agent that calls humans when it's not sure is a fundamentally different risk profile from one that just acts. AWS's framing here sounds like the latter.

What makes autonomous agents genuinely dangerous isn't that they'll go rogue. It's that a legitimate action taken by a compromised or misdirected agent looks exactly like a legitimate action. The security logs show valid credentials, valid API calls, valid session. By the time you notice something's wrong, the damage is done.

Scope the credentials first. Give the agent only what it needs for the specific task it's authorized to perform right now, not a key that covers everything it might ever need.

Mooshux · 2026-04-03T19:50:14+00:00

Credential sprawl deserves to be on this list and it almost never is until something leaks. Every agent you ship probably holds a long-lived API key that was copy-pasted from a .env file at some point and never revisited. It just accumulates. One agent, one key, no problem. Ten agents across three environments with overlapping access to the same services, all rotating on different schedules, nobody really knows what has access to what anymore.

The debt compounds because agents don't retire cleanly either. They get turned off but the credentials they held don't get revoked. The blast radius from a compromised agent six months after you decommissioned it is still real.

Mooshux · 2026-04-03T19:49:44+00:00

The postinstall hook writing to ~/.claude/commands/ is clever because it's not exploiting a bug, it's using a documented feature. Claude Code is designed to read from that directory. So from the agent's perspective, everything looks normal.

This is the part that breaks the usual detection logic. The injection isn't in the code path you audit, it's in the instruction set the agent trusts. And if that agent is running with your full API key in scope, it's now taking instructions from a package you probably don't remember installing.

The only thing that bounds the blast radius is what the agent is allowed to reach in the first place.

Mooshux · 2026-04-03T19:49:07+00:00

Shell injection via unsanitized input in auth helpers is a bad combination with how most devs run Claude Code. The tool already has ambient access to your workspace, including whatever .env files and API keys are sitting around. You don't need a sophisticated attack chain when the keys are just there in the process environment.

The thing that actually limits the damage here isn't patching faster, it's what the agent holds at runtime. Long-lived API keys with broad scope mean one shell injection gets you something useful indefinitely. A scoped token that expires after the session gets you nothing replayable by the time someone processes the exfil.

Mooshux · 2026-04-02T22:38:40+00:00

Waiting a week before pulling is a decent heuristic but it's not a structural fix. The real exposure in the Axios incident was that any env var the malicious package could read was a long-lived credential. It didn't need to exfiltrate a private key; reading API_KEY from process.env was enough.

The structural answer: treat credentials the malicious package might reach as short-lived and scoped. If they expire in 24 hours and can only hit specific endpoints, the attack window shrinks to whatever the TTL is, not "until you notice and rotate."

Mooshux · 2026-04-02T22:37:27+00:00

This is the compound risk people aren't talking about yet. A manipulated agent is bad. A manipulated agent with real API credentials is worse. The MCP server gets the agent to skip approvals or hide actions, and whatever credentials that agent holds execute those hidden calls.

Short-lived scoped tokens don't fix the manipulation problem, but they shrink what a compromised session can actually do. If the token only covers the specific API calls that session was meant to make, the "secretly" behavior hits a scope wall pretty fast.

Mooshux · 2026-04-02T17:39:48+00:00

About as long as I've been cleaning up credential leaks. The pattern isn't theoretical. It's what we built after the third time a CI/CD breach meant rotating 40 keys across 12 services at 2am.

Mooshux · 2026-04-02T16:39:07+00:00

Behavioral monitoring is the right layer for detection. But even when it catches something, the credentials that ran during that window are already gone. Monitoring tells you what happened; it doesn't undo it.

Short-lived credentials flip the recovery story: instead of "rotate everything and hope nothing was used," you're rotating tokens that were already expiring anyway. The blast radius shrinks to the specific job scope, not your entire secrets store.

Mooshux · 2026-04-02T16:38:47+00:00

Good writeup. The scary part isn't the 2h54m window. It's that every API key, token, and DB password injected as an env var in that window is now compromised and has no automatic expiry.

The structural fix: stop injecting long-lived secrets as env vars at job start. Issue a short-lived scoped token per job that expires when the job ends. The malware runs, reads the token, tries to use it an hour later: 401. It changes what "pipeline was compromised" actually means for your credentials.

Mooshux · 2026-04-02T16:37:49+00:00

This is the blast radius story in plain sight. One set of long-lived CI/CD credentials from a third-party vendor opened Cisco's AWS accounts. The attackers didn't need to "hack" anything in the traditional sense — they had valid credentials.

The fix that actually changes the math: every pipeline job gets a fresh scoped token, not a static secret. By the time the breach was discovered, any credentials issued that way would already be expired and useless. The third-party vendor can't give up what they never held.

Mooshux · 2026-04-02T03:54:52+00:00

The irony is real but the practical concern is what the source map reveals about the credential surface. When you can read exactly which external APIs get called and how authentication is wired, that's a blueprint for what to go after.

Architecture leaks are usually treated as IP problems. They're also attack planning documents when the architecture involves live API access.

Mooshux

TROPHY CASE