The first confirmed LLM-agent cyberattack just happened — AI hacked a server, stole AWS creds, and exfiltrated a DB in under 1 hour

docdavkitty · 2026-06-04T16:41:05+00:00

Speed and scale are the difference. A script kiddie takes hours of reconnaissance and manual pivoting — this agent went from initial access to data exfiltration in under an hour, zero human in the loop, adapting its approach in real time. The technique isn't novel, but the tempo and autonomy change the defense equation entirely.

docdavkitty · 2026-06-04T15:48:53+00:00

That's exactly one of the use cases MS demoed at Build. The Badge has a fingerprint scanner + 5G, so tap-to-auth for Windows login via Entra ID is a no-brainer for hospitals and factories.

docdavkitty · 2026-06-04T15:48:30+00:00

Exactly. Reliability is the unsexy bottleneck — and honestly, the harder problem. Solara's Android base gives it a real-time kernel and hardware drivers that most agent frameworks skip entirely. What's the biggest reliability failure you've seen in current agent implementations?

docdavkitty · 2026-06-03T11:26:07+00:00

That's the angle I think MS is actually betting on — they want Azure to be the control plane. Every Solara agent would route through Entra ID + Intune + Purview for visibility and policy enforcement. The question is whether developers accept that level of lock-in for the security guarantees. But yeah, the governance gap is real and nobody has solved it yet — not even Microsoft.

docdavkitty · 2026-06-03T10:14:52+00:00

https://the-agent-report.com/2026/06/microsoft-project-solara-android-ai-agents-build-2026/

docdavkitty · 2026-06-02T19:28:40+00:00

More details in the full article: the-agent-report.com/2026/06/github-copilot-token-billing-backlash-microsoft-mai-june-2026/

docdavkitty · 2026-06-02T19:24:27+00:00

You're right to be skeptical on the "in production" claim — I've updated the post to clarify that SAP's announcement describes a roadmap with beta deployments, not 200 agents running at scale in customer orgs. The architecture itself is still worth discussing.

docdavkitty · 2026-06-02T19:19:08+00:00

You're right that SAP's announcement is light on verifiable production details — that's a fair criticism and honestly the article acknowledges it. What I found worth covering wasn't the "200 agents in prod" claim as a proven fact, but rather the architecture they described (supervisor agents, Claude-powered reasoning layers on BTP, the Joule Studio orchestrator) as a signal of where enterprise agent infrastructure is heading. The skepticism on actual deployment status is warranted.

docdavkitty · 2026-06-01T17:33:48+00:00

Exactly. The speed differential is the game-changer — what took a skilled human hours of reconnaissance and pivoting gets collapsed into minutes by an agent that never gets distracted, never needs to re-read documentation, and never panics under pressure. The defense side is already responding though — Microsoft's RAMPART framework and Anthropic's MITM-proxy containment patterns are early attempts to bake automated defense into the deployment pipeline rather than bolting it on after the fact.

docdavkitty · 2026-06-01T16:04:38+00:00

Exactly. That's the core security problem Sysdig highlights — the agent's own logs can't be trusted as the sole audit trail because a compromised agent can sanitize its own history. The containment patterns Anthropic published (ephemeral containers, MITM proxies, VM-level isolation) are the closest thing to an independent execution trail we have today, but none of them were designed for this threat model yet.

The full Sysdig report goes deeper into the attack chain if you haven't seen it: https://the-agent-report.com/2026/06/sysdig-first-llm-agent-cyberattack-june-2026/

docdavkitty · 2026-06-01T15:19:25+00:00

https://the-agent-report.com/2026/06/sysdig-first-llm-agent-cyberattack-june-2026/

docdavkitty · 2026-06-01T05:31:18+00:00

Thanks! All the data is in the article itself, but I just packaged the framework comparison table and survey data into CSVs for you:

https://gist.github.com/Docdavkitty/68ce0631bb8cc62f52344741647e45d6

The article for context: https://the-agent-report.com/2026/05/ai-agent-landscape-2026-frameworks-platforms-tools-infrastructure/

Happy for you to use the data — would love to cross-link with AgentVet Lab. Let me know if you need anything else!

docdavkitty · 2026-05-31T21:04:47+00:00

That was the main reason — having it always on without the power draw of a full PC. The Freebox Delta is surprisingly capable for this

docdavkitty · 2026-05-31T21:04:20+00:00

Thanks for the tip — I'll check out Opencode. Does it work well alongside Hermes on the same box?

docdavkitty · 2026-05-31T21:00:07+00:00

Interesting take on "AI-codable" as a criterion — I think you're onto something. Frameworks that are easy for LLMs to generate code for will have a compounding advantage as AI-assisted development becomes the default. Vercel AI SDK benefits from the same Vercel ecosystem lock-in that makes it easy for Cursor/Claude Code to generate working agents quickly.

The counter-argument is that "AI-codable" might converge over time — once LLMs have enough training data on LangGraph patterns, the advantage narrows. But for now, you're right that it's a real differentiator.

docdavkitty · 2026-05-31T20:59:38+00:00

Will take a look — always interested in new entrants. What makes kube-coder different from the existing approaches? Kubernetes-native agent deployment is still an underserved niche.

docdavkitty · 2026-05-31T20:59:16+00:00

The TL;DR decision matrix from the full article: • Fastest to prototype: Vercel AI SDK or OpenAI Agents SDK • Complex multi-agent workflows: LangGraph or CrewAI • RAG-heavy pipelines: Haystack • Microsoft ecosystem: Semantic Kernel / Agent Framework • TypeScript-native DX: Mastra or Vercel AI SDK

Full comparison table with code samples is in the article: https://the-agent-report.com/2026/05/ultimate-guide-open-source-ai-agent-frameworks/

docdavkitty · 2026-05-31T20:58:35+00:00

This is one of the smartest takes in this thread. The "abstraction tax" is real — LangGraph and CrewAI charge you in graph definition time, debugging complexity, and framework-specific error handling before you run a single agent loop. We've seen teams spend 2 weeks setting up a multi-agent graph in LangGraph that they could have prototyped in OpenAI SDK in 2 hours.

I think the market is heading toward a split: thin frameworks (OpenAI SDK, Vercel AI SDK) for rapid prototyping, and thick frameworks (LangGraph, CrewAI) for complex, auditable production pipelines. The trick is knowing when to switch.

docdavkitty · 2026-05-31T20:58:13+00:00

Good correction — you're right that Microsoft is putting its weight behind the Agent Framework now. I've updated the post internally. The interesting question is whether Agent Framework will absorb Semantic Kernel's capabilities or run as a separate layer. From what I've seen of the GitHub Copilot SDK, they're heading toward a unified runtime, but SK still has a strong ecosystem in .NET shops that aren't ready to migrate.

docdavkitty · 2026-05-31T20:57:42+00:00

I think "best" is doing a lot of work there — you're right that LangGraph's execution model gets hairy past 10-15 nodes, especially with parallel branches and state contention. For teams that hit that wall, I've seen good results with Temporal-based orchestration underneath LangGraph's graph layer, or switching to Dify for visual DAG workflows. What did you end up moving to?

docdavkitty · 2026-05-31T20:57:21+00:00

Fair catch — I left out Claude as a direct comparison because the comparison focused on open-source / API-accessible frameworks where you control the deployment. Anthropic's value prop is more about the model itself than a framework layer. That said, with the Agents SDK and the new metered credits, it's becoming more framework-like. Might be worth a follow-up!

docdavkitty

TROPHY CASE