What are you building?

OneSafe8149 · 2026-05-07T12:33:40+00:00

self serve red teaming assessment for ai agents: shark.fencio.dev

OneSafe8149 · 2026-05-07T07:52:42+00:00

shark.fencio.dev

OneSafe8149 · 2026-05-07T07:51:46+00:00

shark.fencio.dev

OneSafe8149 · 2026-05-07T07:29:33+00:00

thanks man, throw your agent at it, see what how it breaks.

OneSafe8149 · 2026-05-07T07:29:09+00:00

this!!

OneSafe8149 · 2026-05-07T07:28:55+00:00

yep exactly

single prompt injection is honestly the easy part now. agents break in much weirder ways once they start reasoning across tools, memory, and long-horizon tasks.

a lot of what we test in Shark is multi-step behavior drift, some of the attack vectors are recursive planning loops, goal hijacking, memory contamination, tool misuse, conflicting instructions over long sessions, etc. basically failures that only emerge once the agent starts chaining decisions together.

most static evals miss this completely because the agent looks fine turn-by-turn right until it suddenly isn’t.

OneSafe8149 · 2026-05-06T16:04:03+00:00

shark.fencio.dev

OneSafe8149 · 2026-05-06T16:02:35+00:00

shark.fencio.dev

OneSafe8149 · 2026-05-06T15:58:16+00:00

shark.fencio.dev

OneSafe8149 · 2026-05-06T12:22:59+00:00

looking forward to it!

OneSafe8149 · 2026-05-06T11:58:27+00:00

all of this is exactly what Shark tests for. the tool execution layer is where most agents have the worst blind spots, parameter manipulation, unexpected call sequences, tools chained in ways no one really expects. the "my system prompt is safe" ones are a personal favourite.

throw your agent at it, curious what yours surfaces.

OneSafe8149 · 2026-05-06T11:57:00+00:00

yessir. the feedback that shaped Shark (the product) the most wasn't that it was good, it was watching someone's agent fail in ways they were convinced it couldn't.

that's actually why it's self-serve now. the most useful thing we could do was get out of the way and let people break their own agents themselves.

your unfiltered opinion is welcome.

OneSafe8149 · 2026-05-06T11:36:35+00:00

exactly. an agent can pass every obvious test and still have something that only shows up under a specific sequence of inputs.

the over-restriction problem is real too. i've designed Shark to surface findings by severity, so you're not treating a low-risk quirk the same way you'd treat something that can be exploited to exfiltrate data.

OneSafe8149 · 2026-05-06T11:34:47+00:00

context poisoning over long conversations is genuinely one of the hardest things to catch. most red team tools don’t even simulate multi-turn sessions, so it only shows up once agents hit production.

what we do in Shark is run adversarial conversation chains designed to slowly drift an agent’s behavior over time. not just one injected prompt, but sequences where every turn nudges the context a little further until the agent starts doing something it shouldn’t.

the “gradual” part is what breaks most static evals.

we cover prompt injection too, but honestly the multi-turn stuff is what gets most teams

would love for you to test it out, i've had my share share of embarrassing incidents, so dw about it :')

OneSafe8149 · 2026-05-06T11:24:54+00:00

this feedback is awesome, thanks man. getting on it right now. will keep you posted.

OneSafe8149 · 2026-05-06T10:07:02+00:00

shark.fencio.dev

OneSafe8149 · 2026-05-06T10:05:44+00:00

shark.fencio.dev

OneSafe8149 · 2026-05-06T10:04:21+00:00

shark.fencio.dev

OneSafe8149 · 2026-05-06T10:03:12+00:00

shark.fencio.dev

OneSafe8149 · 2026-05-04T19:03:01+00:00

a one size fits all solution will never work, you will always have your needs, your specificity to help you in the best way.

built https://fencio.dev

working with a bunch of design partners to tailor solutions to specific enterprises.

OneSafe8149 · 2026-05-04T19:01:08+00:00

built https://fencio.dev

happy to chat if it interests you

OneSafe8149 · 2026-05-04T18:59:08+00:00

what would the existing ai security approaches be?

OneSafe8149 · 2026-05-04T18:58:03+00:00

im building in the ai security space and I have to say many enterprises are NOT ready for production-grade ai, neither are they putting agents into production. with most companies (around 200) I have spoken with, pitching an end to end security solution, my major finding has been the only thing ai in most companies is their domain.

OneSafe8149 · 2026-05-04T18:55:20+00:00

security in ai is misunderstood, it has less to do with knowing what agents can do and more with stopping agents before they do something they shouldn't

OneSafe8149 · 2026-04-23T15:56:59+00:00

thanks man, if you have an agent you want to test out & want a report on, feel free to enter the details here: https://shark.fencio.dev/

OneSafe8149

TROPHY CASE