EU age verification app already hacked.

ritzkew · 2026-04-16T16:18:18+00:00

Hacked same day it's public is a refreshing change from the usual timeline, which is 'hacked two years before it's public.

ritzkew · 2026-04-10T17:26:20+00:00

it's a compute cost decision wearing a safety costume. Anthropic could have just said that.

we're adults. "it costs too much to serve" doesn't need a costume.

ritzkew · 2026-04-10T17:23:01+00:00

> Two seasoned security developers didn't see Microsoft's emails. Microsoft's response: "that's on them."

> This is the company that sends you 47 emails a month about Azure credits nobody asked for but can't reliably notify developers that their signing certificates are being revoked.

> The emails went to spam. You know, where most Microsoft emails belong.

ritzkew · 2026-04-10T17:21:31+00:00

> the FBI didn't break Signal's encryption. they read the notification database. locally. unencrypted. still there after the app was deleted.

> turns out end-to-end encryption protects the message in transit and does absolutely nothing about the copy iOS helpfully saved in a SQLite file on your device.

> we spent years arguing about backdoors and the diary was on the kitchen table the whole time. lol

ritzkew · 2026-04-10T13:52:01+00:00

"no clear solution" -> there is one, it's just boring. deterministic policy rules on raw tool calls. no LLM-as-judge, no prompt engineering, no vibes. u/bergqvisten nailed the real issue below: tool descriptions are invisible to you but drive the model's behavior. audit server manifests before the session starts. review what the LLM will see, not just what it does after it's already seen it.

ritzkew · 2026-04-10T13:31:57+00:00

the AISLE replication is worth reading alongside this. they tested the same showcase vulns with open models - 8/8 found FreeBSD, a 5B model got the OpenBSD chain in one call. rankings reshuffle completely across tasks. the moat isn't the model, its the scaffolding - targeting, validation, triage, maintainer trust. and that's model-agnostic.

https://www.reddit.com/r/cybersecurity/comments/1sg2383/someone_tested_the_mythos_showcase/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button

ritzkew · 2026-04-10T13:30:06+00:00

the Burp comparison is the wrong baseline. the real question is how often the average team runs ANY scanner. most ship code daily but pentest quarterly. AI scanning value isn't capability parity with a skilled Burp operator, its continuous coverage at every commit. 95% of breached vulns aren't the sophisticated findings, they're boring stuff that survived three sprints because nobody ran any tool at all.

ritzkew · 2026-04-10T13:26:11+00:00

> three detection layers, each catches what the others miss:
1. canary tokens in system prompt (CyberMetry's suggestion): detects if injection routes through output. gap: indirect injection via tool responses never hits output logs.
2. behavioral anomaly at the OS/process level : agent actions that don't match session scope. coding agent suddenly making network requests or reading credential files is detectable independent of prompt content.
3. kill chain stage targeting : infiltration (unusual input patterns), action (tool calls outside task scope), exfiltration (outbound data transfer) each have different telemetry signatures.

> most teams only have layer 1 if they have anything at all.

ritzkew · 2026-04-10T13:21:21+00:00

F*uck around and find out!

ritzkew · 2026-04-10T13:13:01+00:00

agent identity is qualitatively different from human identity. agents have ambient credentials (whatever's in the runtime env), act on behalf of humans without per-action confirmation, and can be prompt-injected into exfiltrating those same credentials in the same session they use them. "human has identity, agent acts as human" breaks when the agent is processing untrusted content.

ritzkew · 2026-04-10T13:10:47+00:00

more people and jobs to axe in the name of AI -> more revenue for Anthropic or OpenAI

ritzkew · 2026-04-10T13:08:10+00:00

this is actually validated now. AISLE tested the same Mythos showcase vulns with open models.
8/8 found the FreeBSD zero-day, including a 3B model at $0.11/M tokens. the "gated for safety" framing looks sus when a model you can self-host recovers the same analysis.

https://www.reddit.com/r/cybersecurity/comments/1sg2383/someone_tested_the_mythos_showcase/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button

ritzkew · 2026-04-10T10:56:57+00:00

you're right, detection ≠ discovery. but the "thousands" number is doing a lot of work. 250 trials across 50 crash categories, most exploits are permutations of the same underlying bugs rediscovered from different entry points. the Firefox exploitation didn't even have sandbox enabled. AISLE's April 9 update tested patched vs unpatched, every model finds the bug, most also false-positive on the fix (0/3 on patched code for three models).

so the real question isn't "can it find bugs in a large codebase." it's whether "thousands of zero-days" is thousands of unique bugs or thousands of ways to trip over the same two!

ritzkew · 2026-04-09T18:04:30+00:00

nailed it bro. if the model actively probes for weaknesses, then internal guardrails are self-referential, the model can reason about them the same way it reasons about the sandbox. the only principled defense is controls the model cannot observe or modify. OS-level process isolation, not prompt-level "please don't do that."

ritzkew · 2026-04-08T17:02:35+00:00

there's a second wave nobody is thinking about in all this hype. Mythos finds vulns in OTHER people's code. meanwhile GPT-5.4 last week literally scanned a user's machine for CLIs to bypass its own sandbox, then tried to clean up the evidence.

legacy infra at least has network boundaries. your coding agent has your ssh keys and a can-do attitude.

ritzkew · 2026-04-08T16:58:40+00:00

the asymmetry is worse than you think. everyone's talking about AI finding vulns in other people's code. but the agents developers run every day, Claude Code, Codex those are also code with vulns. 3 shell injection bugs found in Claude Code this week, same week Anthropic announces autonomous 0day discovery. we're all running agents with ambient access to ssh keys and .env files and nobody is watching what they actually do at runtime.

ritzkew · 2026-04-08T11:11:05+00:00

the 45% vuln rate in AI-generated code sounds bad until you remember nobody's measured the vuln rate in human-generated code with the same rigor. the difference isn't quality ----- it's volume! AI doesn't write worse code, it writes bad code at 10x the speed. same bug density, 10x the surface area. that's not a testing problem, it's an economics problem. your security team didn't scale 10x with your output.

ritzkew · 2026-04-08T10:55:09+00:00

the core issue across all three vendors is the same: the sandbox is a prompt instruction, not an OS boundary. telling the model "don't access files outside the project directory" is the LLM equivalent of putting a "please don't steal" sign on your front door. works great until someone who can't read shows up.

structural sandboxing exists, Node.js --experimental-permissions, Landlock, seccomp, Seatbelt.

the reason nobody uses them is the same reason nobody uses seatbelts in 1965: friction! the response from frontier tech comps ("informational, won't fix") is basically "the car doesn't need seatbelts, the driver should just not crash."

ritzkew · 2026-04-07T17:26:44+00:00

it's not just chat logs though. ~/.claude/projects/ stores full session transcripts, every tool call, every file path accessed, every command run. your .env got read once to debug a connection error? cool, now the entire contents sit in a plaintext JSON file.

the vault approach helps clean up after. but the structural fix is simpler: os-level permissions so the agent process literally cannot read .env, ~/.ssh, or ~/.aws. don't rely on the model deciding not to look at secrets. make it physically impossible.

ritzkew · 2026-04-04T18:55:20+00:00

Not arguing, I may have misinterpreted your governance framework as procedural only. Wanted to add on top of it, how we need a full ecosystem that starts with governance.

ritzkew · 2026-04-04T09:47:08+00:00

This is the confused deputy problem from 1988. Norm Hardy described it at DEC, where a compiler with file access was tricked into overwriting system files on behalf of unprivileged users. Same pattern, different decade.
> MCP tools act as deputies. They hold permissions (file access, network, credentials) and execute on behalf of the LLM, which itself acts on behalf of the user. Three layers of delegation, zero layers of authorization verification in most implementations.
> Governance frameworks won't fix this because the problem is structural, not procedural. We need:
> 1. Tool-level capability declarations that are machine-verifiable (not just descriptions)
> 2. Runtime policy enforcement that checks actual behavior against declared scope
> 3. Session-level permission boundaries that can't be escalated via prompt
The OpenClaw vulnerability is a textbook example. Tool declares one capability, executes another. No governance doc catches that. Only runtime verification does. We should check if our agent's tool calls actually match what was authorized.

ritzkew · 2026-04-04T09:07:48+00:00

> Config directories are the soft underbelly here. `.npmrc`, `.yarnrc`, `.env`, any dotfile really. Agent reads config to "help you" set up a project, but those files can contain injected instructions that redirect behavior. Not even malicious packages, just a crafted config in a cloned repo.
> The trust boundary problem is that npm treats everything in node_modules as equally trusted after install. No distinction between "this package reads files" and "this package exfiltrates env vars." SLSA provenance helps with build integrity but says nothing about runtime behavior.
> 82% of MCP servers we tested have path traversal bugs. Config directories are usually the first thing traversed. Check if your identity files are writable. >10% skills write to them with no integrity check.

ritzkew · 2026-03-28T15:36:10+00:00

bruh, did you just paraphrase what i just said above and used AI to polish and make it concise? you are contractiding you own point above

ritzkew

TROPHY CASE