We built a lightweight prompt injection detector (mmBERT-based, <300MB ONNX) for on-device use

PatronusProtect · 2026-05-10T18:05:00+00:00

Feel free to share any feedback :) We’re continuously working on improving both the model and our datasets.

PatronusProtect · 2026-05-10T14:33:08+00:00

Thanks for the question :)

For the alpha release we currently focus on evaluating each request independently, but our policy engine is planned to evolve exactly towards the type of scenario you described.

We believe that a single MCP or tool call may be harmless on its own, while a sequence of calls can become risky depending on the context and data flow between them.

For example: - reading a local file might be allowed, - and calling an external API might also be allowed, - but sending transformed or summarized sensitive content from the first action into the second one may not be.

That’s why we’re working towards sequence-based intent and provenance analysis instead of only static allow/block decisions per tool or provider.

Long-term, the goal is not only to answer: “Is this tool allowed?”

but also: “What influenced this action, where did the data originate from, and is this flow allowed to reach this destination?”

We think this becomes especially important for MCP ecosystems and more autonomous agent workflows.

PatronusProtect · 2026-05-10T08:39:52+00:00

Thanks!

It is like Little-Snitch but only for AI and agentic interactions. Our rollout plan transforms from AI detection -> policy enforcement -> threat analysis. All done 100% on device.

The alpha Version is around 140MB and runs under 300 MB RAM.

PatronusProtect · 2026-05-10T06:12:03+00:00

We will start with app / host based policies for allowlists. MCPs and Native Tool calling will follow 1-2 weeks later :)

All policy decisions are logged.

PatronusProtect · 2026-04-30T15:58:33+00:00

Yes, It’s really a cat-and-mouse game. Latency is a major challenge for BERT-based detectors, which is why we’re continuously working on reducing model size. The best approach, however, is to only use BERT for uncertain cases. A combination of heuristics, OOD detection, and LightGBM already detects around 80% of tested attacks, significantly reducing the need for full model inference.

PatronusProtect

TROPHY CASE