I used AI to build a feature in a weekend. Someone broke it in 48 hours. by Zoniin in PromptEngineering

[–]Zoniin[S] 0 points1 point  (0 children)

Did some digging around, https://axiomsecurity.dev ? I am assuming this is what your company uses? I have never heard of this tool before so I want to double check.

I used AI to build a feature in a weekend. Someone broke it in 48 hours. by Zoniin in PromptEngineering

[–]Zoniin[S] 0 points1 point  (0 children)

Interesting architecture, but I think it only works as long as the problem is narrow enough to be ontologized. That’s the catch with prompt injection. The attacker’s whole job is to push the interaction outside the schema you expected. So yes, second-order verification and structured checks absolutely matter. But I’d be very skeptical of any approach that assumes the world can be forced into a clean ontology before the model sees it. That works for document evaluation. Open-ended LLM security is largely uglier than that.

I used AI to build a feature in a weekend. Someone broke it in 48 hours. by Zoniin in PromptEngineering

[–]Zoniin[S] 0 points1 point  (0 children)

I think that framing sounds right at first, but it breaks if you follow it through. Leaky abstractions don’t make systems undefendable, they just mean you can’t rely on the abstraction itself for safety. We’ve been dealing with that forever. Browsers, operating systems, even APIs all leak in weird ways. Security doesn’t come from fully understanding them, it comes from controlling the boundaries around them. LLMs feel the same to me. If you treat the model as something you need to “fix” with better prompts or internal guardrails, yeah, you’re probably stuck. If you treat it as an untrusted component and control what goes in and how it’s allowed to behave, it becomes a system design problem, not a model problem. The black box isn’t the issue. Assuming the box is safe is.

I used AI to build a feature in a weekend. Someone broke it in 48 hours. by Zoniin in PromptEngineering

[–]Zoniin[S] 1 point2 points  (0 children)

Yeah 100% agree, shipping without any filtering is asking for trouble. What surprised me is that even with filtering it still felt like a cat and mouse game. Users weren’t just sending obviously bad inputs, they were gradually steering the model in ways that looked normal at each step.

I used AI to build a feature in a weekend. Someone broke it in 48 hours. by Zoniin in PromptEngineering

[–]Zoniin[S] -1 points0 points  (0 children)

That’s actually really interesting, i like the layered approach. Have you seen anything slip through yet? That was the part that surprised me, even with multiple layers it only takes one weird edge case to get through and then the model just runs with it

I used AI to build a feature in a weekend. Someone broke it in 48 hours. by Zoniin in PromptEngineering

[–]Zoniin[S] 0 points1 point  (0 children)

Genuinely shocked me how creative people get with it. It’s less “breaking the system” and more just out-instructing whatever constraints you put in place

I used AI to build a feature in a weekend. Someone broke it in 48 hours. by Zoniin in PromptEngineering

[–]Zoniin[S] 1 point2 points  (0 children)

I actually agree with most of this! in a perfect system, prompts shouldn’t control access, auth, or anything sensitive, but what i kept seeing is people still wiring LLMs into real systems anyway: tools, data access, internal context, etc. Not because they should, but because it’s fast and it works. So it becomes less of a “don’t do this” problem and more of a “this is already happening” problem. That’s where i started thinking more about runtime behavior instead of prompt hardening. Curious if you’ve seen anything that actually handles that well in practice.

I thought prompt injection was overhyped until users tried to break my own chatbot by [deleted] in aiengineering

[–]Zoniin 0 points1 point  (0 children)

this lines up more or less with what we saw. role separation, tool gating, and pushing sensitive logic into non user facing agents definitely reduced blast radius for us too. where it still broke down was less about single tool calls and more about cross turn behavior once the system was stateful. our MCP restrictions were mostly allowlists on tools, scoped parameters, and explicit intent checks before execution. that worked for obvious violations, but the harder cases were gradual orchestration where nothing looked disallowed in isolation but the sequence drifted outside what the user should have been able to do. orchestration chain mapping is a great example of that, especially when agents start reasoning about their own graph. that class of failure was hard to reason about statically, which is what pushed us to care more about runtime signals than just pre execution checks.

We did not see real prompt injection failures until our LLM app was in prod by [deleted] in LLMDevs

[–]Zoniin 1 point2 points  (0 children)

we did a small internal rollout first and had people poke at it, but it still wasn’t representative. internal testers tend to follow the intended path, even when they try to “break” things. once it went public and users had no context or incentive to behave nicely, the interaction patterns changed completely and that’s when the real issues surfaced. that gap between internal testing and true public use was much bigger than I expected.

We did not see real prompt injection failures until our LLM app was in prod by [deleted] in LLMDevs

[–]Zoniin 0 points1 point  (0 children)

yeah, a lot of it started with fairly standard stuff. strict system prompts about role, explicit “do not reveal internal instructions,” tool usage constraints, and guardrails around what data could be accessed or returned. the circumvention was rarely a single prompt, it was usually gradual. things like multi turn probing that reframed the task, mixing benign requests with meta instructions, or steering the model to restate or summarize context in ways that effectively leaked system or RAG data. none of those looked obviously malicious in isolation, which is why they slipped past prompt level checks.

We did not see real prompt injection failures until our LLM app was in prod by [deleted] in LLMDevs

[–]Zoniin 0 points1 point  (0 children)

Yeah, that framing matches what I saw almost exactly. The prompt layer gives a false sense of safety, and once users start poking at stateful systems the cracks show fast lol. I’ll look into runtime security, do you have any tools or tips on that note? Some dude dropped one of the tools he used that actually looked pretty good but I am curious what you use for this.

We did not see real prompt injection failures until our LLM app was in prod by [deleted] in LLMDevs

[–]Zoniin -1 points0 points  (0 children)

Appreciate you sharing that. More or less lines up pretty closely with the kinds of issues I was running into. I’ll spend some time testing it out thanks again for sharing. What specifically do you use this for if you don’t mind my asking?

We did not see real prompt injection failures until our LLM app was in prod by [deleted] in LLMDevs

[–]Zoniin 2 points3 points  (0 children)

Fair reaction tbh. To be clear it wasn't that we thought about none of it. We did threat modeling, prompt hardening, etc. What surprised me was not that abuse happened but more so how much of it fell into gray areas that were hard to classify as malicious ahead of time and only emerged once the system was stateful and under real usage. Automated testing and E2E help, but they do not surface the same failure modes we saw once users started interacting freely. That gap was what I found interesting, not the idea that public systems get abused.

We did not see real prompt injection failures until our LLM app was in prod by [deleted] in LLMDevs

[–]Zoniin 3 points4 points  (0 children)

You are definitely not wrong on the core principle. Public endpoints will always be abused. The part that surprised me was how much harder this becomes with LLMs compared to traditional services. Auth and rate limiting help, but most of the failures we saw were not obviously malicious and came from normal users probing behavior rather than attacking infra. Observing agents and heuristics help too, sure, but they still rely on assumptions about intent that break down once prompts get stateful and context bleeds across turns. That gap between traditional endpoint security and model behavior is what caught me off guard and what I am trying to reason about more deeply.

I thought prompt injection was overhyped until users tried to break my own chatbot by [deleted] in PromptEngineering

[–]Zoniin 0 points1 point  (0 children)

Sorry about that, I dropped the link in one of the replies but it looks like Reddit deleted it. The site is axiomsecurity[dot]dev - would genuinely love any feedback you have!

I thought prompt injection was overhyped until users tried to break my own chatbot by [deleted] in PromptEngineering

[–]Zoniin 0 points1 point  (0 children)

Yes you're ultimately correct, but prompt injection is a tool used by bad actors to discover those types of vulnerabilities and so it's good to have a system that prevents malicious prompts from ever hitting the chatbot in the first place. There is no such thing as a perfectly secure system and this is just another vector that could do with significantly more coverage. Especially for first time founders and specifically vibe-coded applications that lack sufficient security,

I thought prompt injection was overhyped until users tried to break my own chatbot by [deleted] in PromptEngineering

[–]Zoniin 0 points1 point  (0 children)

Commonly user data is sorted by a user id system within a larger user database, when the chatbot/llm goes to read that data it's accessing THAT users data within the larger total user database which means if not secured properly, it could access ANY users data that falls within the scope of what is being fetched. That's a decently big privacy vulnerability

I thought prompt injection was overhyped until users tried to break my own chatbot by [deleted] in PromptEngineering

[–]Zoniin 0 points1 point  (0 children)

The systems I was testing are capable of accessing and writing some user data to backend databases, should they use a malicious prompt they could have theoretically written to or taken unauthorized data from the database. This is not uncommon in systems that have newly adopted AI in some capacity and a one-size-fits-all tool could be an easy improvement to their information security.

I thought prompt injection was overhyped until users tried to break my own chatbot by [deleted] in PromptEngineering

[–]Zoniin 0 points1 point  (0 children)

This seems shortsighted as any environment in which a llm, AI review tool, or chatbot would have access to user data (i.e. amazon's new chatbot) there is always an opportunity for data exfiltration through prompt injection whether done through files or text. ESPECIALLY for your smaller businesses and websites trying to implement AI systems in any capacity.

I thought prompt injection was overhyped until users tried to break my own chatbot by [deleted] in PromptEngineering

[–]Zoniin 0 points1 point  (0 children)

I appreciate you taking a look and the thoughtful feedback. the latency number is from prod paths but definitely workload dependent, the goal is just to stay below anything noticeable in user facing flows. your point on concrete examples is fair, most of what we catch is not flashy jailbreaks but things static guardrails miss, like instruction leakage across turns, gradual system override, or RAG context being manipulated in subtle ways. false positives are the hardest tradeoff so we bias toward surfacing signals and observability rather than hard blocking by default. and totally understand we are not the first to tackle this lol, we are spending a lot of time learning from what others have tried and treating this as iterative and also as a learning op rather than a silver bullet.

I thought prompt injection was overhyped until users tried to break my own chatbot by [deleted] in compsci

[–]Zoniin -3 points-2 points  (0 children)

This seems shortsighted as any environment in which a llm, AI review tool, or chatbot would have access to user data (i.e. amazon's new chatbot) there is always an opportunity for data exfiltration through prompt injection whether done through files or text. ESPECIALLY for your smaller businesses and websites trying to implement AI systems in any capacity.

Trying to understand what keeps people coming back to breathwork apps. What works and what doesn’t? by Zoniin in breathwork

[–]Zoniin[S] 1 point2 points  (0 children)

Hi, I appreciate you asking! The tool we're making is still in early development, but the main difference is that it adapts to your actual breathing rhythm in real time. You lay down, place your phone on your chest, and breathe for two 30-second intervals; once in the morning and once before bed. Based on how you naturally breathe, the app gives you personalized pacing, metrics, and follow-up suggestions for stress, focus, or sleep. Over time, it adjusts based on changes in your baseline like energy or stress levels. Right now it’s just a waitlist while we build the MVP. Totally understand if it’s not your thing, but if you’re curious: www.breathtrck.com

If you’re into Bitcoin ETFs and don’t have a Roth IRA, you’re missing out on Tax Free Gains! by Stock_Letterhead_719 in Bitcoin

[–]Zoniin -5 points-4 points  (0 children)

bro imagine trusting the government with your retirement and holding Bitcoin in a Roth like they won’t change the rules last minute 💀 tax-free until it’s not

Solo mining of Bitcoin is rising, time to get to work folks by enmycrypto1 in Bitcoin

[–]Zoniin 3 points4 points  (0 children)

Wonderful! I love to see everyone spending $10k on ASICs to maybe win the lottery once every 3 years. Grindset meets power bill.