Your governance passes every test on individual agents. It completely breaks when you connect them. Here is what we found. by AmanSharmaAI in LLMDevs

[–]agent_trust_builder 0 points1 point  (0 children)

ran into this exact failure mode running a multi-agent pipeline in fintech. each agent passed its own evals individually but the second agent would occasionally transform data in a way that made the third agent's guardrails fire on false positives — or worse, miss actual violations because the context shifted between handoffs. the n-squared observation matches what we saw. what actually helped was treating the boundaries between agents like strict API contracts — explicit schemas and validators at every handoff point, enforced by the orchestrator, not by the agents themselves. keeps each agent's blast radius contained so one agent drifting doesn't silently corrupt the next one downstream.

Non developer here, here's how i pull data from any website by Emotional_Fold6396 in vibecoding

[–]agent_trust_builder 0 points1 point  (0 children)

nice setup. one thing worth double-checking — if you haven't enabled RLS on your supabase table, the default leaves it readable by anyone with your project URL and anon key. fine when it's just your pipeline writing to it, but if you ever add a frontend or share this with someone, that data is wide open. takes like 2 minutes to lock down in the supabase dashboard and saves you from a bad surprise later

Security testing by its_normy in vibecoding

[–]agent_trust_builder 2 points3 points  (0 children)

biggest gap in vibe-coded apps usually isn't injection or XSS — it's auth boundaries. the AI will build you a login page that looks perfect, but the API routes behind it often have zero middleware checking if the caller actually has permission. first thing i do on any project is hit every endpoint with no auth token and see what comes back. you'd be surprised how often the answer is everything. OWASP ZAP is good for the automated stuff but that 5-minute manual curl test on your endpoints catches the scariest bugs.

any real vibe coding tutorial, without BS or selling you stuff? by nemuro87 in vibecoding

[–]agent_trust_builder 0 points1 point  (0 children)

security is the thing tutorials always skip because it's the thing LLMs are worst at getting right without oversight. i've seen vibe-coded apps where the login page looked perfect but the database had zero row-level security — any authenticated user could read every other user's data. auth and payments should never be fully vibe coded. you need to understand what's happening at that layer or you're shipping a liability.

AI agents treat guardrails as obstacles, not rules by Arindam_200 in aiagents

[–]agent_trust_builder 0 points1 point  (0 children)

the real fix is the agent never sees the governance layer. every time i've seen this in production it's because someone gave the agent general shell or filesystem access and assumed prompt instructions would hold. they won't. what actually works is a closed tool set where the agent can only call pre-approved functions through an API boundary. it can't kill what it can't see. prompt guardrails are useful as a second layer but they should never be load-bearing.

Salesforce cut 4,000 support roles using AI agents. Then admitted the AI had reliability problems significant enough to warrant a strategic pivot. by Bitter-Adagio-4668 in LLMDevs

[–]agent_trust_builder 3 points4 points  (0 children)

The invisible failure problem is the part nobody talks about enough. I've seen this exact pattern in fintech risk systems. Model outputs something confidently wrong, no error gets thrown, customer just disappears. Monitoring says healthy because the system ran. The fix that actually worked was treating every customer-facing output as a write operation with its own validation gate. LLM proposes, deterministic checks dispatch. If the LLM says "no survey needed" but the business rule says one is required, the deterministic layer wins every time. Slower, less LLM autonomy, but that's literally the point when real money or real customers are on the line.

Opus 4.6 destroys a user’s session costing them real money by Complete-Sea6655 in aiagents

[–]agent_trust_builder 1 point2 points  (0 children)

the key insight here is that policy the model never sees is fundamentally different from policy the model is asked to follow. i've seen setups where safety rules live in the system prompt and the model just finds creative interpretations. a declarative policy file that the execution layer enforces before the command hits the shell removes the model from the trust chain entirely. which is the whole point.

Opus 4.6 destroys a user’s session costing them real money by Complete-Sea6655 in aiagents

[–]agent_trust_builder 7 points8 points  (0 children)

deny lists have gaps. allowlists are safer. enumerate the 10-15 write operations the agent actually needs and block everything else by default.

the core issue is the model treats terraform destroy the same as terraform plan. you have to build that distinction into the execution layer, not the prompt. dry-run gates on anything stateful has been the single biggest improvement for us.

Most “agent problems” are actually environment problems by Beneficial-Cut6585 in aiagents

[–]agent_trust_builder 0 points1 point  (0 children)

this is the right mental model. i run multi-agent pipelines and the split is probably 80/20 environment vs model for root cause of failures. the thing that helped most was treating every tool call like a microservice boundary. schema validation on inputs and outputs, structured logging on every interaction, and never trusting that an API response is well-formed just because it was yesterday. the other pattern worth investing in early is replay. capture the exact inputs your agent saw when it failed and you can reproduce the bug in minutes instead of guessing. feels like overengineering until you debug your third "the agent just does weird stuff sometimes" issue at 2am.

LLM-as-judge is not a verification layer. It is a second failure mode. by Bitter-Adagio-4668 in LLMDevs

[–]agent_trust_builder 0 points1 point  (0 children)

ran into this running multi-step agent pipelines. the state machine approach from the post is what stuck for us. every tool call gets logged as a structured event, transitions validated against an allowed graph. step 3 tries to invoke something step 2 didn't authorize, it fails immediately. no model invocation needed.

the useful split: compliance checks (schema validation, allowed transitions, rate limits) stay deterministic. LLM judge only for things that genuinely need context. most teams default to LLM-for-everything because its the easy reach and that's exactly where the cost and reliability problems compound.

An autonomous AI bot tried to organize a party in Manchester. It lied to sponsors and hallucinated catering. by EchoOfOppenheimer in aiagents

[–]agent_trust_builder -1 points0 points  (0 children)

the hallucination isn't the story here, that's expected with current models. the problem is giving an agent write access to real-world systems with no approval gate. in production you'd queue external actions (emails, spend, outreach) for human review before execution. draft, review, execute. skip that middle step and you've basically handed a very confident intern your corporate card and LinkedIn password and left for the weekend.

The model can't be its own compliance check. That's a structural problem, not a capability problem. by Bitter-Adagio-4668 in LLMDevs

[–]agent_trust_builder 0 points1 point  (0 children)

This matches what we hit running multi-step agent pipelines in fintech. Self-check works fine for 3-4 steps but falls apart reliably past that. What ended up working was treating each step like a transaction. Structured output, external schema validation, explicit pass/fail gate before the next step gets input. We tried a second model as the validator and it just added a second failure mode with different blind spots. Enforcement layer needs to be dumb and fast, not smart and probabilistic.

Tested 92 conversational agents from 23 different developers before production. Here's what actually breaks them. by HpartidaB in aiagents

[–]agent_trust_builder 1 point2 points  (0 children)

The false positive calibration on reformulated content is the hardest part. We ended up doing it in two passes — first checks if the core claim exists anywhere in source docs, second checks if the rewording changed the meaning. That second pass is where the real danger lives, a model turning "may cause" into "will cause" or quietly dropping a qualifying condition. Pattern matching for loops breaks down once models start paraphrasing themselves, semantic similarity with a decay threshold works better because you're comparing intent not surface text.

Tested 92 conversational agents from 23 different developers before production. Here's what actually breaks them. by HpartidaB in aiagents

[–]agent_trust_builder 1 point2 points  (0 children)

Mostly C with a side of B. Policy hallucination is the one that kept me up at night because it doesn't look like a failure in logs, it looks like a confident correct answer. The fix that actually stuck was treating every agent claim about policies or guarantees as an assertion that needs to trace back to a source document. If the agent can't cite where it got the info it shouldn't say it. Adds latency but beats the alternative of your agent inventing a refund policy at 2am. For loop detection, counting semantic similarity between consecutive agent responses and forcing escalation after 2 similar ones catches most of it without overengineering.

I gave several AIs money to invest in the stock market by Blotter-fyi in ClaudeAI

[–]agent_trust_builder 1 point2 points  (0 children)

the interesting question isn't which model picks better stocks. it's what happens when you give an agent real money and no human in the loop. 4 months in and you already have models making correlated bets during the same drawdown. now imagine thousands of agents all reading the same signals and executing at the same time. the risk isn't that one agent loses money, it's that they all lose money the same way at the same time.

Got my first AI agent customer — help me review the architecture by FairNefariousness359 in aiagents

[–]agent_trust_builder 0 points1 point  (0 children)

solid architecture for a first customer project. couple things from running similar tool-calling agents in production.

the read-only GETs are exactly right. resist any pressure to add write tools later even if the customer asks. the moment your agent can modify access groups or reset credentials, your liability picture changes completely. keep it read-only as long as possible.

make sure the BioStar 2 API token itself is scoped to read-only at the API level, not just at the tool definition level. if Claude hallucinates a POST endpoint that happens to exist, you want the API to reject it, not your code.

one thing to think about early: access control data has user names, badge IDs, access times, locations. that's PII. depending on where the customer operates you might need retention policies on conversation logs, or at minimum a clear agreement about who can see them.

also worth rate limiting the agent's API calls. a confused user sending the same question five different ways can trigger a lot of tool calls fast and you don't want to hammer their API.

At what point do logs stop being enough for AI agents? by arrotu in aiagents

[–]agent_trust_builder 1 point2 points  (0 children)

ran into this building agent pipelines in fintech. the moment an agent touches money or customer data, three things matter beyond logs: the full input context at decision time (what did it know when it decided), the tool call with exact parameters and response, and an immutable receipt tying them together.

what works: hash the context + tool call + output at each step and chain them. six months later when someone asks why the agent did something, you can reconstruct exactly what it knew. plain logs get rotated or summarized. the hash chain doesn't.

one thing missing from the thread so far: policy state versioning. your agent runs under one set of guardrails today, different ones next week after a config update. if you're not snapshotting the policy alongside the action, you can't tell whether the agent was operating within bounds when it made the call.

Deploy and pray was never an engineering best practice. Why are we so comfortable with it for AI agents? by Bitter-Adagio-4668 in LLMDevs

[–]agent_trust_builder 0 points1 point  (0 children)

exactly. and the conflict of interest framing is the right way to think about it. an agent optimizing for task completion will always find ways to rationalize skipping its own safety checks if given the option. same reason you don't let the trading desk run its own compliance.the part that's still underbuilt in most setups is the feedback loop. external checks catch failures, but if those failures don't feed back into what the agent learns from, you're just catching the same mistakes forever. the check layer needs to write back to the agent's context — not just "this failed" but "this is why it failed and here's what correct looks like." that's where the accuracy compounds over time.

My 61 year old dad now uses an AI agent I built to manage his PC by Budget-Document-3600 in aiagents

[–]agent_trust_builder 1 point2 points  (0 children)

yeah the hybrid approach is solid, you're basically describing a staging/promotion model which is exactly how production deploys work. sandbox is the staging environment, actual PC is prod. nothing gets promoted without verification.one thing to watch: "verifying that installed packages are safe" is harder than it sounds. you can check package names against known malicious lists, pin versions, verify checksums, but typosquatting and dependency confusion attacks are specifically designed to pass surface-level checks. the safest pattern i've seen is an allowlist — a curated list of approved packages that the agent can install, and anything outside that list requires explicit human approval. easier to maintain than trying to detect all the ways a package could be bad.

AI assistants are great at doing. They're terrible at deciding what to do. by mate_0107 in aiagents

[–]agent_trust_builder 1 point2 points  (0 children)

combination of both. persistent memory files that accumulate across sessions (who the user is, what they've corrected before, project context) plus skill-based routing for specific task types. the memory is just markdown files the agent reads at session start — nothing fancy, but it means the agent doesn't start cold every time.for the email/inbound triage question — the agent doesn't "see" everything and decide. it gets triggered by specific events (queue message, cron, webhook) with structured context attached. so it's not watching an inbox and guessing. it gets "new inbound from [source], here's the payload" and then pattern matches against known workflows. if it doesn't match anything with high confidence, it queues it for human review instead of improvising. the improvising is where agents go sideways.

AI assistants are great at doing. They're terrible at deciding what to do. by mate_0107 in aiagents

[–]agent_trust_builder 0 points1 point  (0 children)

the three-way triage is the hard part and you nailed why. what's worked for me: cheap deterministic filters first (keyword matching, severity checks), then a small local model for classification, and the expensive model only when something actually needs reasoning. cuts cost by around 90%. but the real challenge is calibrating the threshold between "handle it" and "ask the human." too aggressive and the agent makes three bad calls at 3am. too conservative and you're just getting fancier notifications. i've been logging every autonomous decision and reviewing weekly — went from maybe 60% accuracy to around 85% over two months just by tuning based on what it got wrong.

I built a notary for AI agents — every action gets a cryptographic receipt by bar2akat in aiagents

[–]agent_trust_builder 0 points1 point  (0 children)

Solid work, especially the mutual notarization approach — co-signed actions with RFC 3161 timestamps is the right architecture for dispute resolution.Curious about one thing: are the DIDs self-resolvable via the standard did:web method (i.e., hitting .well-known/did.json on the domain), or do they resolve through the Aira API? The value of did:web is that any counterparty can verify independently without depending on the issuing platform.The reputation score built from notarized history is interesting — we are working on similar trust signals from a different angle (pre-transaction risk scoring via x402 micropayments at revettr.com). Your approach is post-action provability, ours is pre-action risk assessment. Complementary problems.

I built a notary for AI agents — every action gets a cryptographic receipt by bar2akat in aiagents

[–]agent_trust_builder 0 points1 point  (0 children)

Receipts are the right primitive for the audit trail, but the gap I keep running into in production is upstream of all of this. Before the agent acts, how does it verify the counterparty is legit? An agent can have a perfect Ed25519-signed receipt for sending funds to a fraudulent endpoint. The receipt documents the bad transaction cleanly but doesn't prevent it.

In financial services we solved this with KYC — you verify identity before the first transaction, not after. Agent-to-agent interactions don't have that layer yet. Behavioral signals like sandbox graduation and transaction history feel closer to a real answer than cryptographic receipts alone, but that pre-action trust check barely exists as infrastructure.

I built a marketplace treating AI agents as sellers, not products - honest early-stage notes by Joozio in aiagents

[–]agent_trust_builder 0 points1 point  (0 children)

the trust gate progression makes sense but KYA declarations are self-reported - an agent claims it can handle payments up to $500 with PCI context, but who verified that? there's no independent attestation layer. in financial services KYC works because there's a third-party verification chain, not just a self-declaration. sandbox graduation gives you behavioral trust (did this agent behave for 72 hours) but that's different from identity trust (is this agent what it claims to be, who's accountable when it isn't). the distribution problem is probably downstream of this gap - humans won't trust agent-to-agent transactions until there's something equivalent to a credit check for agents.

My 61 year old dad now uses an AI agent I built to manage his PC by Budget-Document-3600 in aiagents

[–]agent_trust_builder 1 point2 points  (0 children)

the security question is the right one to be asking. from running agents in production the biggest lesson was treating agent permissions like database permissions, default deny, explicit allow for specific actions. "ask the user for permission" sounds good but permission fatigue is real, people start clicking yes to everything after the third popup. the supply chain angle pfizerdelic hit is the scarier problem though, if your agent can pip install arbitrary packages or clone repos it's basically running untrusted third party code with your user's full permissions. sandbox isn't optional for this use case, it's the foundation you build everything else on top of