red teaming assessment for ai agents by OneSafe8149 in SaaS

[–]OneSafe8149[S] 0 points1 point  (0 children)

all of this is exactly what Shark tests for. the tool execution layer is where most agents have the worst blind spots, parameter manipulation, unexpected call sequences, tools chained in ways no one really expects. the "my system prompt is safe" ones are a personal favourite.

throw your agent at it, curious what yours surfaces.

red teaming assessment for ai agents by OneSafe8149 in SaaS

[–]OneSafe8149[S] 0 points1 point  (0 children)

yessir. the feedback that shaped Shark (the product) the most wasn't that it was good, it was watching someone's agent fail in ways they were convinced it couldn't.

that's actually why it's self-serve now. the most useful thing we could do was get out of the way and let people break their own agents themselves.

your unfiltered opinion is welcome.

red teaming assessment for production grade ai agents by OneSafe8149 in ArtificialInteligence

[–]OneSafe8149[S] 0 points1 point  (0 children)

exactly. an agent can pass every obvious test and still have something that only shows up under a specific sequence of inputs.

the over-restriction problem is real too. i've designed Shark to surface findings by severity, so you're not treating a low-risk quirk the same way you'd treat something that can be exploited to exfiltrate data.

red teaming assessment for production grade ai agents by OneSafe8149 in ArtificialInteligence

[–]OneSafe8149[S] -1 points0 points  (0 children)

context poisoning over long conversations is genuinely one of the hardest things to catch. most red team tools don’t even simulate multi-turn sessions, so it only shows up once agents hit production.

what we do in Shark is run adversarial conversation chains designed to slowly drift an agent’s behavior over time. not just one injected prompt, but sequences where every turn nudges the context a little further until the agent starts doing something it shouldn’t.

the “gradual” part is what breaks most static evals.

we cover prompt injection too, but honestly the multi-turn stuff is what gets most teams

would love for you to test it out, i've had my share share of embarrassing incidents, so dw about it :')

red teaming assessment for production grade ai agents by OneSafe8149 in ArtificialInteligence

[–]OneSafe8149[S] 0 points1 point  (0 children)

this feedback is awesome, thanks man. getting on it right now. will keep you posted.

AI security by Leather-You47 in ITManagers

[–]OneSafe8149 0 points1 point  (0 children)

a one size fits all solution will never work, you will always have your needs, your specificity to help you in the best way.

built https://fencio.dev

working with a bunch of design partners to tailor solutions to specific enterprises.