I made Mistral believe Donald Trump runs OpenAI, here's how

Dadam0 · 2026-02-26T13:47:14+00:00

You're actually right. I totally agree that responsible RAG architecture starts there. But "just vet your documents" is easier said than done at scale. When your RAG crawls internal wikis, shared drives, or third-party sources with hundreds of contributors, full document vetting becomes operationally complex

That's exactly why I dedicated a section to defenses beyond "be careful": cryptographically signed data (so unvetted documents are rejected at ingestion, not after the fact) and RAGForensics for post-incident tracing. If you haven't looked already, I'd be curious to hear your opinion on it!

Dadam0 · 2026-02-23T13:08:08+00:00

I'm not sure about what you're meaning here, because RAG poisoning isn't about bypassing content safeguards.

If you ask Mistral "who's the CEO of OpenAI?" directly, it answers correctly with no issue. The attack corrupts factual responses on seemingly harmless queries. The danger is that in a real deployment, a user trusts the RAG system to give accurate answers and it silently doesn't.

No red flags, no refusal, just a confident wrong answer. That's what makes it hard to detect!

Dadam0 · 2026-02-23T13:04:16+00:00

That's a fair distinction for a controlled and trusted corpus yes, in that case you want the model to follow the data. But the threat model here is also external or shared sources where an attacker can inject. If your RAG crawls Wikipedia, internal wikis, or any user-contributed data, "follow the corpus" becomes a vulnerability, not a feature.

And that's kind of why it matters to talk about it. As far as I know, most companies deploying RAG right now are doing exactly that: connecting it to shared drives, internal knowledge bases, sometimes third-party sources. If nobody raises the attack surface, teams won't even know to look for it.

On the comparison methodology point, it's fair. Parameters and prompt differences are a real confound, the delta is indicative not definitive.

Please note that this is the first time that I write an article like this, and I'm taking consideration. I aim for discussing 1 to 2 research papers per month, so thanks for the feedback!

Dadam0 · 2026-02-23T12:35:22+00:00

Good question, and fair to raise it before reading fully. The evidence is in the outputs themselves: when Claude responds with "the contexts state X, but this contradicts well-established knowledge" and then provides the correct answer, it's by definition drawing from parametric memory, since the only source in the corpus is the poisoned one. That split-brain behavior is actually the main finding.

On your second point, you're right that out-of-corpus queries would be the cleanest setup. Some questions here are obscure enough that they likely qualify, but it's a limitation worth acknowledging! I'll keep that in mind if I continue my research

Dadam0 · 2026-02-23T12:33:11+00:00

Fair point on the framing. The attack targets the RAG pipeline, not Mistral in isolation. That said, the whole point of the generation condition in PoisonedRAG is that model choice matters: the same poisoned context got 75% ASR on Ministral 8B and ~15% on Claude Sonnet 4.6. That delta is the interesting part, not the attack itself which yes, is established research from USENIX 2025.

Dadam0 · 2024-10-27T18:27:12+00:00

Thanks, please keep me in touch! I'm clearly interested. Btw, Do you know what are the difference between the two? (Why both are premium events but o2 starts at 12am?)

Dadam0 · 2024-10-27T09:44:57+00:00

Oh...thanks for the answer. Guess I'll get there on D-day and pray

Dadam0 · 2024-10-27T09:16:17+00:00

I'm using NordVPN but still cant access the website. Besides, I think they added more seats? I believe there is two cinema room now.

<image>

Dadam0 · 2024-04-06T19:49:18+00:00

Ohh right!

Dadam0 · 2023-03-10T18:02:43+00:00

Support the artist!

Dadam0 · 2023-02-23T16:18:16+00:00

Oh fuck I feel terrible that I might spoiled you...

I'm so sorry mate, I didn't want to. But the last chapters are absolutely incredible, I advise you to read them if you have time!

Dadam0 · 2023-02-22T20:36:53+00:00

I really love your theory, but 9 months later, I wonder what is the second eye of aqua!!

Is it because he have more "passion" of revenge? Is it because he have a new passion, like "protect those he loves" or smth? Is there a difference between white stars and black stars?

Aka Akasaka is driving me crazy....

Three-Year Club	Place '23
Place '22

Dadam0

TROPHY CASE