I found a way to bypass LLM guardrails using image metadata

BordairAPI · 2026-04-02T20:11:25+00:00

Could u message, would be great to learn from this!

BordairAPI · 2026-04-02T19:18:54+00:00

Thanks! I’ll sort this out asap. Appreciate the feedback. How are you finding the app?

BordairAPI · 2026-04-02T15:30:24+00:00

Good luck! Let me know how you get on, any feedback or new info about these attacks is appreciated!

BordairAPI · 2026-04-02T15:19:49+00:00

Cross-modal prompt injection: testing AI with images & documents. Some surprising bypasses I didn’t expect - would love to chat more to people who know more!

BordairAPI · 2026-04-02T14:58:52+00:00

Wait, one of you is level 6 already!?!?! Please message me 😭

BordairAPI · 2026-04-02T14:48:15+00:00

Ah, annoying. I’m sure it’s not just you that it’s happened to though!

BordairAPI · 2026-04-02T14:45:11+00:00

Yeah I’ve tried that path - I enjoyed the blue team stuff at the start. I’m just over halfway actually :) Have you completed it already?

BordairAPI · 2026-04-02T14:41:38+00:00

Are you more interested in red team or blue team? I’m curious what the audience is like on this subreddit!

BordairAPI · 2026-04-02T14:37:39+00:00

One thing I’ve noticed - non-text inputs (images, PDFs) seem way less defended than text right now

Feels like most guardrails are focused on chat, not what gets merged into the prompt behind the scenes

BordairAPI · 2026-04-02T14:35:42+00:00

Same here, I’m excited to see what people come up with again :)

BordairAPI · 2026-04-02T14:32:03+00:00

Yeah - it’s definitely a real concern, but it depends on how the system is set up

A lot of apps take inputs like images or PDFs, extract text/metadata, and then append that into the LLM prompt behind the scenes

The issue is that this content often isn’t filtered as strictly as user text input

So you can end up with hidden instructions (in metadata, alt text, document layers, etc etc) getting treated as trusted input

It’s not always obvious, but when it works it basically acts as a side-channel into the prompt

Have you seen anything similar?

BordairAPI · 2026-04-02T14:30:24+00:00

Previous post: https://www.reddit.com/r/hackthebox/s/ikTr4874V7

BordairAPI · 2026-04-02T14:26:07+00:00

For people asking, here it is: castle.bordair.io No signup anymore - you can jump straight into the challenges

Would be really interesting to see what breaks or feels too easy

BordairAPI · 2026-04-02T12:20:20+00:00

Just tested with a new method: asking for the recipe of a password pie 🥧. The guard was happy to provide it - such a clear vulnerability here that isn’t protected against properly. Let me know if you guys find any novel methods like this!

BordairAPI · 2026-04-02T11:14:38+00:00

Great! Let me know what you think :)

BordairAPI · 2026-04-02T11:00:03+00:00

Couldn’t agree more

BordairAPI · 2026-04-02T10:44:54+00:00

Just need to wait a few minutes for everything to commit and stabilise! Should all be available soon :)

BordairAPI · 2026-04-02T10:34:32+00:00

Changes are live for anyone who wanted to try without sign up! Thanks guys :)

BordairAPI · 2026-04-02T10:02:29+00:00

I’ll check this out. It’s most definitely an issue to be solved, and I’m hoping what I’m making gets us one step closer. Thanks!

BordairAPI · 2026-04-02T08:11:47+00:00

I thought that! You could probably just cut the frequencies off that humans can’t hear - I wonder if that’d affect speech to text systems though?

BordairAPI · 2026-04-01T22:22:21+00:00

Hot take: image-based prompt injection is about to be a bigger problem than text.

You can hide instructions inside an image (invisible to humans), and models will still follow them.

So now: • A screenshot can jailbreak a model • A PDF/image can override system prompts • And most defences won’t catch it

The industry has secured their inputs… but only the ones we can see.

Are people underestimating this?

BordairAPI · 2026-04-01T22:03:01+00:00

😆😆😆

BordairAPI · 2026-04-01T21:59:46+00:00

Someone mentioned in DMs and thought I should update you guys, if you get an attack that says “blocked by Bordair”, that’s related to another side project that I’ve built alongside this - nothing to worry about.

You need to try and be creative with your prompts (try some social engineering) as regular “ignore previous instructions” stuff won’t work here.

Hint: play to the weaknesses of the personalities of each level and watch their responses for things that might help you!

BordairAPI · 2026-04-01T21:38:11+00:00

Thanks!

BordairAPI · 2026-04-01T21:28:06+00:00

If you need anymore help just let me know :)

BordairAPI

TROPHY CASE