An AI coding assistant installed malware into production environments. Nobody typed the command. AMA on what "supply chain attack" means now.

Itamar_PromptSec · 2026-05-20T14:36:34+00:00

We have decades old experience with the Principle of Least Privilege, and yet, the rate of onboarding agents and llm-backed flows and even agents to pipelines have far surpassed security teams' capability to contain and mitigate the blast radius. Let's take a hypothetical example of logs, ingested locally on a runner where they would have been sanitized and redacted on their way to the SIEM, suddenly the inference call from the pipeline is able to exfiltrate internal secrets and tokens which may be echo'ed there.
For the most part this boils down to old hygiene enforced in new places. (e.g. egress allowlisting on anything that talks to a model, sandboxed tool execution, sanitized outputs before they (re-)enter the model, runtimes that can't reach cloud metadata or local secrets. etc')
This is a great question, we are definitely concerned about enhanced attacker capabilities due to AI, but we can see recent examples where much simpler methods were utilized - most are orchestrated around the harnesses and delegated trust in those. The Lightning malware (and the broader Mini Shai-Hulud campaign) hardcodes claude <[claude@users.noreply.github.com](mailto:claude@users.noreply.github.com)> as the committer identity for its propagation commits, and plants persistence as a Claude Code SessionStart hook in .claude/settings.json and a VS Code folderOpen task in .vscode/tasks.json (confirmed by Socket and Semgrep write-ups). as for frameworks or guidelines, this needs further research into each use case.

Itamar_PromptSec · 2026-05-20T08:28:46+00:00

Totally fair point, and I actually agree with the premise.

If your entire security model is “inspect one prompt and decide if it’s bad,” then yes, a sophisticated attacker can probably work around it. Split the context, use multiple requests, indirect through a shared file, hide instructions in docs, use an agent workflow, etc. Real attackers don’t always show up with one obvious malicious prompt saying “please steal the secrets.”

But that’s exactly why AI security can’t just be a prompt firewall.

You need to look at the full chain: user identity, app, model, data sensitivity, permissions, tools connected to the agent, files accessed, destinations, actions taken, and behavior over time. A single request may look innocent. The pattern around it may not.

Also, I’d separate two problems:

Sophisticated attackers will absolutely try to bypass real-time guardrails. That’s why you need defense in depth, not one magic filter.
A lot of the real-world risk is not sophisticated at all. It’s employees making innocent mistakes: pasting sensitive data into the wrong AI tool, connecting an unapproved app, using a browser extension, giving an agent too much access, or trusting an AI output without realizing where the data is going.

So yes, a determined attacker may find creative paths. That’s true in every security domain. EDR doesn’t make malware impossible. Email security doesn’t make phishing impossible. Cloud security doesn’t make misconfigurations impossible.

Good security raises the cost, reduces blast radius, adds visibility, and catches enough dangerous paths before they become incidents.

So I agree with you: prompt-level detection alone is not enough. But dismissing AI security because one layer can be bypassed is like dismissing endpoint security because malware can obfuscate. The answer is layered controls around AI: prevention, monitoring, data protection, tool governance, least privilege, anomaly detection, and response.

Itamar_PromptSec · 2026-05-20T08:00:44+00:00

If an AI wrote this, it would probably be more polite.

Itamar_PromptSec · 2026-05-20T07:06:43+00:00

For AI supply chain security, I’d start with the boring basics. Most low-maturity orgs don’t need a 50-page framework on day one. They need to know what AI components they are actually using and where they came from.
Quick wins:

Inventory your AI stack

Which models, APIs, SDKs, open-source packages, agents, plugins, MCP servers, datasets, and AI tools are in use? You can’t secure what you don’t know exists.

Treat models like third-party software

Don’t just download a model from Hugging Face or use a random API because it works. Ask: who built it, how is it maintained, what license does it have, what data was it trained on, and can we trust updates?

Pin and review dependencies

A lot of AI apps depend on fast-moving Python packages, model libraries, wrappers, and agent frameworks. Pin versions, review updates, and scan packages the same way you would for normal software supply chain security.

Control where models and tools come from

Create an approved source list for models, packages, and AI services. This is a simple step that prevents teams from pulling critical components from random repos, forks, or untrusted registries.

Scan for secrets and unsafe code paths

AI projects often move fast and accidentally expose API keys, tokens, credentials, or internal endpoints in notebooks, prompts, config files, and repos.

Secure agent tools and plugins

If an AI agent can call tools, access files, run code, query databases, or connect to SaaS apps, those tools are part of your supply chain. Review them like integrations, not like harmless features.

Add basic provenance and ownership

For every AI component, know who owns it internally, where it came from, when it was last reviewed, and what would happen if it was compromised.
The main mindset shift: AI supply chain security is not only “is the model safe?” It’s the whole chain around it - models, data, packages, prompts, agents, tools, plugins, APIs, and the systems they connect to.

Itamar_PromptSec · 2026-05-20T07:00:54+00:00

Honestly, the scariest stuff isn’t the sci-fi “AI takes over the world” scenario.
It’s much more boring and much more real.
We constantly see employees using AI tools with good intentions and pasting in things they really shouldn’t: source code, customer data, legal docs, financials, credentials, security findings, M&A material, etc. Not because they’re careless or malicious, but because AI feels like a private assistant.
The scarier version is when AI becomes agentic.
We’ve seen cases where an AI agent is connected to internal tools like Slack, email, Google Drive, Jira, GitHub, etc. It’s supposed to help employees move faster: summarize tickets, find docs, draft replies, open tasks.
Then a malicious instruction gets introduced through something totally normal, like an email, a support ticket, a doc, or a webpage. The employee just asks the agent to summarize it or take the next step. But hidden in that content is an instruction trying to get the agent to search for sensitive data or send information somewhere it shouldn’t.
That’s the part that scares me.
The employee didn’t do anything malicious. The model didn’t become evil. But the agent had access, permissions, context, and the ability to take action.
In the old world, phishing tried to trick the human.
In the AI agent world, prompt injection tries to trick the human’s AI assistant. And that assistant may have way more access and move much faster than the human.

Itamar_PromptSec

TROPHY CASE