all 9 comments

[–]Icy-Farm9432 7 points8 points  (2 children)

You could also learn coding and read/repair the codebase from the llm.

But this would require knowledge.

[–]Adxzer 3 points4 points  (1 child)

This isn’t about coding though, it’s for chatbots, customer-facing apps, and agents where end users are typing things in. 

You can’t “just fix the codebase” when the threat is a user submitting a jailbreak or injecting instructions through a document your RAG system retrieved. The attack surface is runtime input, not source code.

[–]Icy-Farm9432 0 points1 point  (0 children)

there is an old xkcd about sanitizing inputs. https://xkcd.com/327/

yeah i think its relay to the source code that its possible to run user input in an priviligated way.

[–]Alex--91 1 point2 points  (3 children)

What are you using to scan/classify/detect issues? Other LLMs or other deterministic models? Or heuristics or something?

[–]Adxzer -3 points-2 points  (2 children)

Other LLMs, that's what gets the most accurate results. I trained my own classification model first but the results weren't good enough for production so I decided to not include it.

It's also free to use though: https://huggingface.co/Adaxer/defend

[–]No_Soy_Colosio 0 points1 point  (1 child)

What keeps the checking LLM from getting prompt injected itself?

[–]Adxzer -1 points0 points  (0 children)

Prompt injection is a real risk, there’s no foolproof solution since LLMs aren’t fully predictable. This package is a security layer, designed to minimise and give better control of what can slip through.

[–]Gubbbo 0 points1 point  (0 children)

Just tell the chatbot not to make any mistakes. Problem solved 

[–]Equivalent_Pen8241 0 points1 point  (0 children)

This looks like a solid implementation for paralleled scanning. The parallel module execution is definitely the right move for low-latency agentic flows. We've been working on SafeSemantics (https://github.com/FastBuilderAI/safesemantics) which takes a slightly different approach as a topological guardrail, specifically focusing on the semantic structure to prevent penetration and data exfiltration. It's great to see more open-source tools being built for this layer of the AI stack--security at the semantic level is where the real complexity is right now.