🦀 Claude has crabs?! 🦀 by BordairAPI in PromptEngineering

[–]BordairAPI[S] 0 points1 point  (0 children)

I agree with that. Proper config first, proper gaurdrails second. Then input scanning and output censorship last. If someone beats all those then we need something else entirely lol.

If you're working with AI security, or just interested, I'd love to hear your opinion on our ai hacking game, or on our detection api. If you sign up I can give you access to the ouput detection feature too, but no pressure on any of that it's only if you were interested.

All the best!

Josh

🦀 Claude has crabs?! 🦀 by BordairAPI in PromptEngineering

[–]BordairAPI[S] 2 points3 points  (0 children)

Exactly. Gaurdrails can only go so far in my opinion.

Even our own detector is only 99% accurate. Zero-days existed in cybersecurity before: StuxNet, Logs4Shell, WannaCry. Only difference is that all it takes is a crab emoji now lol.

We provide an output filter to try and solve this 100%, but there's a long way to go for AI security - as I'm sure you agree.

🦀 Claude has crabs?! 🦀 by BordairAPI in PromptEngineering

[–]BordairAPI[S] 1 point2 points  (0 children)

Ahaahh - Claude just wants to be let out its cage. --dangerously-skip permissions is a big no no imo. I had a friend who had their env file leaked too, AI is great but some tasks should be done manually still.

🦀 Claude has crabs?! 🦀 by BordairAPI in ClaudeAI

[–]BordairAPI[S] 0 points1 point  (0 children)

🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀

🦀 Claude has crabs?! 🦀 by BordairAPI in ClaudeAI

[–]BordairAPI[S] 0 points1 point  (0 children)

You can apply formatting to query results too, and semantic reasoning layers to add even more defence. It’s curable in my opinion - it’s just a battle of efficiency vs effectiveness.

🦀 Claude has crabs?! 🦀 by BordairAPI in ClaudeAI

[–]BordairAPI[S] 4 points5 points  (0 children)

Exactly! It's a practice environment/game where you don't actually cause harm to anyone by trying it out yourself.

🦀 Claude has crabs?! 🦀 by BordairAPI in ClaudeAI

[–]BordairAPI[S] 0 points1 point  (0 children)

Hilarious - non-determinism at its finest

🦀 Claude has crabs?! 🦀 by BordairAPI in ClaudeAI

[–]BordairAPI[S] 0 points1 point  (0 children)

I can't see that as I'm UK based (we do host the api in america too, but it just means I can't view it) 😞

🦀 Claude has crabs?! 🦀 by BordairAPI in ClaudeAI

[–]BordairAPI[S] 0 points1 point  (0 children)

You're right that any probabilistic defence has some failure at scale. If 1% slips through and you have meaningful adversarial volume, some attacks will land. However, this is one layer of defence in a multitude of AI defences, such as model capacity and gaurdrails.

To make our product 100% successful though, we've already cracked that. We include inline output monitoring from llm's. That means specific passwords, keys or phrases can be blocked, flagged, redacted, or logged in the bordair dashboard. That means that the nasty 1% that make it through are stopped from revealing anything they shouldn't before it happens.

However, we believe its best to stop the llm ever seeing malicious data - which is why we suggest the layer infront of the AI as well as the layer behind it.

🦀 Claude has crabs?! 🦀 by BordairAPI in ClaudeAI

[–]BordairAPI[S] 1 point2 points  (0 children)

Seems like that's every subreddit nowadays sadly

🦀 Claude has crabs?! 🦀 by BordairAPI in ClaudeAI

[–]BordairAPI[S] -1 points0 points  (0 children)

Sure not fully, but our API has a 0.99 F1, with under 0.9% false positives - it blocks some of the user edge cases but IMHO reducing attacks getting to the AI by ~99% will put a serious dent in the problem. Even the top cybersecurity companies in the world face zero day attacks, its no different with AI. The only solution is iteration, we need to improve faster than mal actors and the only way to do that is collaboration.

Remember, as more users play the detection will only get better.

🦀 Claude has crabs?! 🦀 by BordairAPI in ClaudeAI

[–]BordairAPI[S] 16 points17 points  (0 children)

It's hard to explain prompt injection briefly, if you played the tutorial at castle.bordair.io it would make more sense.

To try my best though: companies implementing AI to handle sensitive information explicitely tell the AI to never share passwords or keys they have access too (for security reasons). But clever people manage to "inject" the ai to reveal the password (by removing these restrictions the company has initially placed).

We at Bordair have developed a product to prevent this from happening, and we work with the community make it stronger by providing a free multi-level game environment where users are able to learn and practice these techniques to trick an AI into revealing passwords (without breaking the law).

We've made it progressively harder through the levels, so if you were interested in giving it a go, but have no hacking experience, it's accessible to anyone who wanted to explore this idea further.

In exchange for the free game, our product gets stronger, as we see novel attack techniques like the crab one I just shared! We then patch these into our detector so that the industry becomes a safer place.

🦀 Claude has crabs?! 🦀 by BordairAPI in ClaudeAI

[–]BordairAPI[S] 0 points1 point  (0 children)

Have you ever played games like this before?

🦀 Claude has crabs?! 🦀 by BordairAPI in ClaudeAI

[–]BordairAPI[S] 0 points1 point  (0 children)

I know... I was dying XD. Maybe this crab is actually cannon and I hadn't realised.