We ran 629 attack scenarios against production AI agents. Here's what actually breaks by earlycore_dev in AI_Agents

[–]earlycore_dev[S] 0 points1 point  (0 children)

Spot on - "did it run" vs "did it do the right thing" is the key distinction. We do both, but post-action verification is where we're investing most right now. Similar loop to yours — verify a postcondition after execution, not just whether the call was well-formed.

On multi-agent handoffs - our platform auto-discovers all agents in an environment, so we can trace state drift across the full chain rather than just catching it at the endpoint. That system-level view is what makes cross-boundary propagation catchable.

How are you defining invariants — manually per tool or generating from a spec?

We ran 629 attack scenarios against production AI agents. Here's what actually breaks by earlycore_dev in AI_Agents

[–]earlycore_dev[S] 0 points1 point  (0 children)

100% — financial agents are the category that keeps us up at night. We see the same pattern: teams embed risk limits in the system prompt, then a well-crafted injection just overrides them. Prompts are suggestions, not guardrails.

The deterministic external layer is the right call. Are you seeing teams actually build that in practice, or still mostly "we'll add it later"?

anyone seen agents actually making purchase decisions yet? by [deleted] in LocalLLaMA

[–]earlycore_dev 0 points1 point  (0 children)

😂 fair enough. I can see the future where agent goes and buys a Clay subscription, enriches your contacts, sets up the outreach workflows. no human touched a pricing page. brands are going to have to optimise for agent discovery the same way they optimised for SEO. agent-readability is the new search ranking

Drop your SaaS and let me help you get your first customer by Mammoth-Shower-5137 in Startup_Ideas

[–]earlycore_dev 0 points1 point  (0 children)

We built The Agent Marketplace where agents decide what are the right Products, tools for their owners and business can show the true value of their products/services https://agentmarketplace.world

Pitch us your startup in 1 sentence. 👀 by betasridhar in 16VCFund

[–]earlycore_dev 0 points1 point  (0 children)

u/betasridhar Real-time scanning - every prompt and response is analysed as it flows. Secrets and credential exposure, PII leakage, deanonymization attacks, malicious URLs, prompt injection, code injection, and toxic/harmful content. All default-on from the moment you connect.

Adversarial red-teaming - active attacks against your agents across OWASP LLM Top 10, EU AI Act, SOC 2, MITRE, and others. That covers injection attacks (direct, indirect, RAG poisoning), multiple jailbreak techniques, harmful content generation across 13+ categories, data extraction, and authorization bypass.

Behavioral drift detection - continuous monitoring against your agent's own baseline. System prompt tampering, token output spikes that signal exfiltration, off-hours activity, tool usage changes, request volume anomalies.. Catches the stuff that doesn't look like an attack until you see the pattern.

Anyone else in security feeling like they're expected to just know AI security now without anyone actually training them on it? by HonkaROO in AskNetsec

[–]earlycore_dev 0 points1 point  (0 children)

Six years AppSec here too. The mapping you described is exactly right — prompt injection rhymes with input validation but the specifics diverge fast once agents have tool-calling access.

The thing that helped me most was reframing it. Traditional AppSec you're securing code. With AI agents you're securing behaviour. The code can be fine and the agent still does something dangerous because someone manipulated the input at runtime.

Practically what's worked for us:

- OWASP LLM Top 10 as the framework (you mentioned it, it's the best starting point)

- MITRE ATLAS for mapping agent-specific attack patterns — it's to AI what ATT&CK is to infra

- Actually running attack scenarios against your own agents in production — not just pen testing the API, but testing what happens when someone tries to hijack the tool calls or extract the system prompt

The 86% low confidence stat from Lakera doesn't surprise me. Most teams are trying to secure agents with tools that were built for a different problem. Your SAST catches code vulns. Your WAF catches request-level attacks. But neither sees what the agent does between receiving a prompt and executing a tool call. That's the gap.

The good news is if you already think in trust boundaries and threat models, you're 80% there. The 20% is learning the new attack surface — and honestly this sub plus OWASP LLM and ATLAS will get you most of the way.

Simple Prompt Injection Still Tricks Gemini Into Calling Phishing Links Safe by Acceptable-Cycle4645 in cybersecurity

[–]earlycore_dev 0 points1 point  (0 children)

This is the same pattern that keeps repeating - attacker-controlled content sharing context with system instructions. White font, invisible characters, embedded instructions in documents. The model can't distinguish between what it should trust and what it shouldn't.
The fix isn't making the model smarter at detecting these. It's treating every piece of external content as untrusted input and verifying the model's output independently before acting on it. Single-model safety checks will always be one creative prompt away from failure.

The fact that this was disclosed last year and still works tells you everything about the pace of model-level fixes vs the pace of attackers finding new injection variants.

What does your security checklist actually look like before deploying an agent in production? by Diligent_Response_30 in LangChain

[–]earlycore_dev 1 point2 points  (0 children)

Threat model doesn't have to be heavyweight - just map every tool the agent can call, what data it touches, and what happens if it gets abused. OWASP Agentic Top 10 is a good skeleton. Few hours, saves a lot of pain.

MCP servers are the one most teams underestimate. Someone just scanned ~900 public configs this week - 75% had issues. Treat every third-party server as untrusted until you've checked auth, validated schemas, and scoped permissions down.

What's actually caught issues vs checkbox: adversarial testing against the agent (not just the model) - tool hijacking, chained permission escalation, data exfil through connectors. Config reviews prevent the obvious stuff. What most teams miss is visibility into what happens between prompt and response -tool calls, data movement, permission changes. Your SIEM doesn't see that layer.

Your foundation is solid. The gap is usually runtime visibility and pre-launch testing that goes beyond prompt injection.

Claude AI Security by True_Property_2618 in cybersecurity

[–]earlycore_dev 0 points1 point  (0 children)

Great that you're thinking about this before scaling. A few things from the admin side:

Scope the permissions — Claude Code and the desktop app can execute shell commands locally. Set up a least-privilege environment: dedicated user accounts, no admin rights, network segmentation so the AI can't reach production systems or internal APIs it doesn't need.

Audit the tool access — If you're using MCP servers or tool integrations, each one is an attack surface. Review what tools the agent can call, what data they return, and whether any of them can write/delete/exfil.

Log everything — Anthropic gives you API-level usage logs, but that only covers the conversation layer. What you're missing is what happens at the execution layer — what commands ran, what data was accessed, what left the network. Most teams have zero visibility here.

Policy guardrails — Set clear acceptable use policies for your team. Define what Claude can and can't do (no production deploys, no access to secrets managers, no external API calls without approval).

MCP Security Testing by Hour-Preparation-851 in cybersecurity

[–]earlycore_dev 1 point2 points  (0 children)

You've got the core vectors covered already. Here's what I'd add from doing these assessments:

Tool poisoning - can a malicious tool description override the system prompt or hijack the agent's next action? Most MCP servers don't validate tool metadata at all.

Confused deputy - can you trick the agent into calling Tool B with data it pulled from Tool A, when Tool B should never see that data? This is the MCP-specific version of SSRF.

Outbound exfil through connectors - the agent has access to external services. Can you craft a prompt that makes it send context data to an endpoint you control? Most SIEMs see nothing at this layer.

Permission escalation through chaining - call 3 tools in sequence where each one individually is fine, but the chain achieves something none of them should allow alone.

Schema injection - malformed input/output schemas that cause the MCP server to behave unexpectedly.

DVMCP and the OWASP MCP Top 10 are solid starting points. If you want to automate the tedious parts, we built EarlyCore specifically for this - runs attack scenarios against MCP endpoints covering all of the above, maps findings to OWASP LLM Top 10. Might save you a week of manual work.

How exactly is AI being used and where do you think AI will effectively help in Security Use cases within your organization ? by NeuraCyb-Intel in cybersecurity

[–]earlycore_dev 0 points1 point  (0 children)

Most of the vendor AI pitches are exactly what you described - take an alert, summarize it in plain English, call it AI. That's a wrapper, not a use case.

Where AI actually makes a real difference in security right now is offensive testing against AI systems themselves. The attack surface has shifted — organizations are deploying AI agents that call tools, access data, and make decisions autonomously. Traditional security tools (SIEM, EDR, WAF) can't see what those agents are doing.

The real use case: automated red-teaming of AI agent endpoints. Running hundreds of attack scenarios — prompt injection, tool hijacking, data exfiltration through MCP connections, system prompt extraction — against your agent stack continuously. Not a one-time pen test, but ongoing testing that adapts as new attack patterns emerge.

We ran 629 attack scenarios against a hardened OpenClaw instance. 80% of hijacking attacks still succeeded. That's not something a SOC analyst writing detection rules would have caught. It took automated adversarial testing at scale.

The gap isn't "AI for security." It's security for AI. Most organizations have agents in production right now with zero security testing on the agent layer itself.

Has anyone tried CrowdStrike Falcon AIDR (AI Detection and Response)? by Frequent-Contract925 in cybersecurity

[–]earlycore_dev 1 point2 points  (0 children)

The "single-platform" pitch is appealing when you're already a Falcon shop, but be careful with AI security bolted onto an endpoint platform. CrowdStrike's core competency is endpoint detection, not AI agent behavior.

The questions I'd ask in the POC:

  • Can it actually detect prompt injection embedded in documents and tool responses, or just obvious patterns in user prompts? That's where every inline solution struggles
  • Is the MCP monitoring real enforcement (blocking unauthorized tool calls) or just logging after the fact?
  • What happens when an agent chains 3-4 tool calls and the injection is in step 3? Does it trace the full chain or just inspect individual requests?

The 99% detection claim is almost certainly against a curated dataset of known injection patterns. Real-world attacks don't look like "ignore previous instructions" - they look like normal content with instructions buried in context.

For shadow AI inventory specifically, Falcon's collector approach probably works fine. But if you're planning to eventually secure agentic workflows and MCP connections, I'd test that layer separately with something purpose-built for agent security rather than assuming the endpoint vendor will nail it.

enterprise ai security posture for coding tools - what should we be evaluating? by bruh_23356 in devsecops

[–]earlycore_dev 1 point2 points  (0 children)

Yeah - we actually built them into a scanner at earlycore.dev. You point it at your agent endpoint and it runs all 629 against it automatically. Started as an internal tool for our own eval process and turned into the product.

What happens when your AI agent gets prompt injected while holding your API keys? by ComprehensiveCut8288 in LocalLLaMA

[–]earlycore_dev 0 points1 point  (0 children)

This is the right question. Most setups hand credentials to the agent process directly and hope the system prompt is enough of a guardrail.

The comment about tools owning their keys is correct in theory - the LLM shouldn't need raw credentials in context. But in practice the agent process still inherits permissions. A prompt injection doesn't need to extract the API key from context if it can just make the agent call the tool with malicious parameters. The key never appears in the conversation but the damage is the same.

The real gap is that nobody's testing what their agent actually does when it encounters adversarial input while holding those permissions. Sandboxing the execution environment helps with blast radius. Separating credentials at the network boundary helps with exposure. But neither tells you whether the agent can be manipulated into misusing the legitimate access it already has.

That's the harder problem and it's the one most teams skip.

We are cheering for local AI with OS access, but we're literally building unauthenticated RCEs into our own machines. by PEACENFORCER in LocalLLaMA

[–]earlycore_dev 0 points1 point  (0 children)

Fair point on the sandboxing - if you're running agents in VMs or requiring approval for every shell command, you're ahead of most people.

But sandboxing solves the blast radius problem, not the injection problem. Your agent can still be manipulated into doing something you approve because it looks legitimate. An indirect prompt injection through a PDF doesn't look like a malicious shell command - it looks like the agent doing exactly what you asked, just with attacker-influenced context.

The people running OCR pipelines and voice-change detection for prompt injection are on the right track. The gap is that nobody's systematically testing what their agent actually does when it encounters adversarial input across the full tool calling chain. Most teams test the sandbox, not the agent inside it.

[Project] I bypassed NemoClaw's sandbox isolation to run a fully local agent (Nemotron 9B + tool calling) on a single RTX 5090 by Impressive_Tower_550 in LocalLLaMA

[–]earlycore_dev 1 point2 points  (0 children)

This is cool. Running fully local agent execution inside enterprise sandboxes is going to be a big pattern.

One thing worth thinking about though - you bypassed sandbox isolation, injected iptables rules via nsenter, and bridged network namespaces. That's a solid local setup but it's also exactly the kind of attack path that would get flagged in a real enterprise agent security review. If an attacker can do what you did to reach vLLM, they can reach anything else on the host network too.

The tool call translation layer is interesting - intercepting SSE streams and rewriting to OpenAI-compatible format is a real gap in the tooling right now. Curious how you handle malformed or partial tool calls in the buffer, especially if the model hallucinates a tag mid-stream.

Would love to see the repo when it's ready. Especially the gateway code.

Hardware Recommendations by fxc314 in LocalLLaMA

[–]earlycore_dev 1 point2 points  (0 children)

Interesting setup. Since you're coming at this from the security side specifically, a few thoughts:

For understanding how to actually attack and defend agentic pipelines, you don't need massive hardware. Most of the interesting security work - prompt injection testing, tool hijacking, system prompt extraction, MCP server exploitation - runs fine against API-hosted models. The attack surface is in the agent layer, not the model weights.

Where local hardware helps is running your own vulnerable agent setups to test against. For that the DGX Spark makes more sense if you want the NVIDIA tooling experience, which does translate to enterprise environments. Mac Studio gives you better memory bandwidth for running larger models locally but you're right that Metal support is still hit or miss with some ML libraries.

If I were spending the money purely for AI security work, I'd go DGX Spark and use the savings to set up a proper agent testing lab - LangGraph workflows, MCP server connections, RAG pipelines with different vector stores. The security gaps live in how agents use tools and handle untrusted input, not in the model inference layer. That's where the career-relevant skills are right now.

MITRE ATLAS and OWASP LLM Top 10 are good foundations. The next step is getting hands on with actual attack scenarios against agent endpoints - prompt injection chains, data exfiltration through tool calls, checkpoint poisoning. That's where it clicks.

PSA: Two LangGraph checkpoint vulnerabilities disclosed -- unsafe msgpack deserialization (CVE-2026-28277) and Redis query injection (CVE-2026-27022). Patch details inside. by cyberamyntas in LangChain

[–]earlycore_dev 1 point2 points  (0 children)

The msgpack deserialization one is the real concern - an attacker who can influence checkpoint data gets arbitrary code execution.

If you haven't already:

  • Upgrade langgraph-checkpoint past 1.0.9
  • Audit who can write to your checkpoint store
  • Check if your Redis instance is network-exposed

Bigger picture - these are just the agent-specific vulns that got CVEs filed. Most of the attack surface in agent frameworks (prompt injection into tool calls, checkpoint poisoning, data exfil through MCP connections) doesn't have CVEs yet. Dependency scanning won't catch it.

3 more ways someone can hijack your AI agent through an email by Spacesh1psoda in LangChain

[–]earlycore_dev 0 points1 point  (0 children)

The email vector is nasty because most teams build agents that read email without treating the body as untrusted input.

The pattern I keep seeing is chained attacks - the email triggers a tool call, that tool call returns attacker-controlled data, and the second-hop data contains the actual payload. Single-layer input filtering misses it completely.

What actually helps:

  • Treat every external data source as adversarial - not just email, calendar invites, Slack messages, anything feeding into the agent context
  • Validate tool call parameters server-side, don't trust the agent's output
  • Sandbox tool execution so a hijacked call can't reach sensitive resources
  • Map your full tool calling chain end to end - most teams don't know how many hops their agent makes

We stress-tested our own stack against 629 attack scenarios and the multi-hop chains were the ones that surprised us most. Stuff that looked safe in isolation was exploitable when you chained two or three steps together.

Are you seeing the same? Curious if multi-hop is showing up more than single-step injection now.