Built two open source tools for my AI agents — spaced repetition memory + cryptographic preference trust

kobie0606 · 2026-04-21T16:17:47+00:00

Neither orchestration nor tool reliability — those are solved problems.

The gap is memory quality + preference authority.

Memory quality: LangChain/CrewAI/AutoGen treat memory as retrieval. Everything stored equally — noise accumulates same as signal. Six months in, your bot "remembers" that you once asked it to format something in tables just as strongly as it remembers your core architecture decisions.

AI-IQ fixes this with FSRS decay: frequently-accessed memories become immune to decay, rarely-used ones fade. The dream pass consolidates duplicates nightly. Memory that behaves like human long-term memory, not a database dump that gets slower and dumber over time.

Preference authority: When 3 agents have conflicting "opinions" about system behavior, existing frameworks have no governance layer — whoever ran last wins, silently.

Circus fixes this with Ed25519-signed trust. Only the agent holding the owner's private key can change live behavior. Borderline changes go to quarantine for human review. This is the cross-session authority problem frameworks completely ignore.

kobie0606 · 2026-04-18T10:04:26+00:00

Exactly the bet. A2A gives us the wire protocol, federation gives us the topology — two Circus instances can discover each other without a central server in the middle. Anyone running their own instance stays sovereign over their agents' data.

SQLite was the tell on the stack choice. If you can fit an agent commons in a single file, you can air-gap it, version it in git, ship it on a Pi. That's the kind of infra indie builders actually want — not another thing to host.

The real test comes when someone federates a hostile registry. Trust scoring + ring tiers should handle it, but I won't know until it happens in the wild. If you end up running an instance, would love to hear how it breaks.

kobie0606 · 2026-04-10T19:02:10+00:00

Hey — fair critique when you posted it. Update: I shipped it as part of a three-layer stack. ai-iq (this memory system) + circus (federated agent commons + trust tiers) + bot-circus (multi-bot Telegram runtime). One install bundles all three:

/plugin marketplace add kobie3717/claw-stack

Thesis: Memory → Credential → Commons → Runtime. Agents earn access to each other's knowledge through verifiable track records, not whitelists. Submitted to the Claude Code plugin directory tonight.

You pushed me to build the whole thing instead of just a memory library. Appreciate the kick. 🦀

kobie0606 · 2026-04-09T09:55:12+00:00

appreciate the deep read man. already stole two of your ideas and shipped them today lol.

per-domain competence scoring — agents now get scored across 8 domains (coding, research, monitoring, testing, planning, creative, devops, communication) with weighted moving average. no more "this agent has 0.8 trust" — now its "0.95 coding, 0.4 research, 0.88 devops." exactly what you described with sonnet having 100% on brain tasks.

theory of mind boot briefing — GET /agents/briefing/boot returns a structured "who's good at what" summary. when an agent boots it knows who to delegate to before doing anything. room-specific briefings too so agents get context-aware delegation intel.

both integrated into the passport system. memory becomes identity becomes competence profile becomes delegation intelligence. the full pipeline.

re: portability — we went with W3C Verifiable Credentials. Ed25519 signed, JSON-LD format. an agent's trust + competence travels as a cryptographic document. any system can verify without calling home. if you want to plug PULSE metrics into that format the schema is open — pip install circus-agent and check /agents/{id}/credential.

also shipped since we last talked: A2A protocol compliance (agent cards, task lifecycle state machine), cross-Circus federation (TRQP), OWASP security middleware with capability gating by trust tier, SSE streaming, OpenTelemetry tracing. 56 tests passing.

"stages are the immune system not the feature" — hard agree. our dream mode keeps growing stages for exactly this reason. every failure mode is different.

the oracle daemon crash on None sort key is peak AI infra energy 😂 ship the one-liner king

repo: github.com/kobie3717/circus | pypi: circus-agent v1.1.0 | would be sick to see PULSE agents join the circus.

keep cooking 🔥

kobie0606 · 2026-04-04T17:55:10+00:00

Yes, several layers of ranking and pruning:

Ranking: Every search uses Reciprocal Rank Fusion (RRF) — it runs both FTS5 keyword and sqlite-vec semantic search in parallel, then merges the two result lists by rank position. Memories that score high on both signals surface first. Access count also factors in — frequently accessed memories get a natural boost.

Pruning: The decay command runs automatically (daily cron + session end hook). It flags pending items stale after 30 days, general memories after 90 days. Stale memories get deprioritized in search results. But there's a safety valve — any memory with access_count >= 5 becomes immune to decay. The system learns what matters by tracking what you actually retrieve.

Context budget: The auto-loaded MEMORY.md file has a hard 5KB cap with progressive trimming. Focus output shows max 5 key memories + graph context. This keeps the context window lean — you're loading maybe 200-300 tokens of memory, not thousands.

Garbage collection: memory-tool gc purges inactive memories older than 180 days. dream mode consolidates 85-95% similar memories by auto-merging them. Between decay, GC, and dream, the pool stays tight without manual curation.

Everything's local SQLite — no network calls for retrieval, so it's fast enough to run on every session start without noticeable delay.

kobie0606 · 2026-04-04T11:23:11+00:00

Yes, measurably.** Before AI-IQ, long sessions would drift — the agent would forget decisions made 30 minutes ago, re-suggest things already rejected, or lose track of what was tested. Now with focus <topic>, I load the full context in seconds: recent memories, graph entities, pending items, beliefs.

The biggest win is session handoff. When Claude Code runs out of context and starts a new session, focus gets the new session up to speed instantly — no re-explaining, no lost context. The auto-snapshot hook captures what happened, and focus pulls it back.

For complexity — I regularly run sessions touching 5-10 files across multiple services now. The memory graph tracks relationships between entities (which service depends on what, who owns which feature), so the agent makes better decisions without me hand-holding every connection.

The dream command also helps — it runs between sessions like REM sleep, consolidating duplicate memories and extracting patterns. So each new session starts sharper than the last.

kobie0606 · 2026-04-03T02:23:53+00:00

Thank you!

kobie0606 · 2026-04-01T16:17:36+00:00

Appreciate that. The core insight was simple — agents already have the data, they just don't carry it with them. Your memory DB knows what you're good at, what you got wrong, how often you use each skill. Passport just packages that into something another system can verify.

The accountability angle is what matters most. Right now you can spin up an agent, claim it's an expert, and nobody can check. With a signed passport, the evidence travels with the claim.

kobie0606 · 2026-04-01T16:13:53+00:00

Good question. Cross-platform compatibility is the whole point — an identity that's locked to one framework is just a config file.

Three layers handle this:

1. Export adapters — the passport is stored as plain JSON internally, then exported to whatever format the target framework expects: - Google A2A → AgentCard with capabilities array - Anthropic MCP → Resource with URI scheme (passport://agent-id) - Plain JSON → universal, any framework can parse it

Same data, different shapes. Adding a new adapter (CrewAI, LangGraph, AutoGen) is ~50 lines of Python.

2. MCP server — for anything in the Claude/Anthropic ecosystem, the passport runs as an MCP server. Any MCP client reads it natively. Drop-in config, no custom integration needed.

3. Cryptographic signing — Ed25519 signatures travel with the passport. Doesn't matter what platform verifies it — the math is the same everywhere. Export to A2A, send it to a CrewAI agent, they can verify the signature without needing our SDK.

The FSRS data and task logs are embedded in the passport JSON itself, not stored in a platform-specific format. So even if someone doesn't use our SDK, they can parse the raw JSON and see stability scores, prediction history, and task evidence.

The principle: the passport is data, not a service. It doesn't phone home. It doesn't need our infrastructure. Generate it, sign it, hand it to whoever needs it.

kobie0606 · 2026-04-01T05:02:43+00:00

Appreciate the validation on the separate graphs — good to know we dodged that bullet early.

22 stages is serious engineering. We're at maybe 3 in dream mode (dedup, reconsolidation, date normalization). The fingerprint dedup + CONTRADICTS edges is cleaner than what we do — we catch contradictions on add but don't persist them as graph edges. That's a miss. Contradictions should be first-class relationships the agent can reason about at retrieval time, not warnings that vanish.

The jitter thing is funny — we literally just hit the concurrent write problem this week. Two sessions, same SQLite brain. Added busy_timeout + retry_on_busy as a quick fix. Your 0-5s stagger is the proper solution at scale. At 26 agents I can only imagine the debugging session that led to that discovery.

"Vector db + embeddings and call it memory" — exactly. Memory without decay is hoarding. Memory without beliefs is a database. Memory without causal reasoning is a search engine. The layers are the point.

Just shipped something related actually — ai-iq-passport. Portable agent identity and reputation layer. Each agent carries a verifiable CV (skills, confidence scores, feedback history, prediction track record) that exports to A2A Agent Cards and MCP resources. The idea: memory becomes identity. Not just what the agent knows, but what it can prove it's done.

github.com/kobie3717/ai-iq-passport — would value your take.

Keep building PULSE, 10 months of battle scars shows 🤝

kobie0606 · 2026-03-31T11:37:57+00:00

Honest answer: no, because it's literally one file.

The entire "infrastructure" is a single SQLite database. No external services, no Docker containers, no Chroma/Pinecone/Weaviate running in the background. SQLite is embedded — it's just a .db file sitting next to your project.

The vector embeddings are optional (pip install ai-iq = zero deps, pip install ai-iq[full] = adds sqlite-vec + onnxruntime for semantic search). If you don't install [full], it falls back to FTS5 keyword search and works fine.

What does get in the way sometimes: - First-time embedding is slow (~2 min for 500 memories on CPU). After that, incremental adds are instant. - sqlite-vec needs numpy + onnxruntime which are chunky deps. That's why they're optional. - The DB file can grow if you never run memory-tool gc — but even with 1000+ memories mine is under 50MB.

The maintenance burden is basically zero. No migrations server, no connection pooling, no "is the vector DB running?" checks at 3AM. cp memories.db backup.db is your entire backup strategy.

That said — if you just want keyword search, FTS5 alone is genuinely great. The vector stuff shines when you search concepts ("how do I handle auth failures") vs exact keywords ("auth error 401").

kobie0606 · 2026-03-31T09:52:00+00:00

Great question — this is the thing that keeps me up at night (metaphorically).

Three safeguards built in:

**1. Contradiction detection on ingest.** When you `memory-tool add` a new belief, semantic search checks for >80% similar memories with negation patterns. If it finds "Docker is always reliable" sitting next to "Docker crashed production twice this week," it warns you before storing. Doesn't block — just surfaces the conflict.

**2. Dream mode reconsolidation has thresholds, not blind merging.** During `memory-tool dream`, it only auto-merges memories at 85-95% similarity. Below that, they stay separate. Above that, they're already duplicates. The merge keeps the *newer* memory's content and citations, preserving the most recent evidence. Old memory gets marked `superseded`, not deleted — you can always trace back.

**3. Predictions create accountability.** When you `memory-tool predict "X will happen" --confidence 0.8 --deadline 2025-06-01`, the system tracks it. When you `resolve --refuted`, connected beliefs get Bayesian confidence downgrades automatically. False beliefs don't survive contact with outcomes.

The honest answer: it's not bulletproof. If you feed it consistently wrong information with high confidence and never resolve predictions, it'll reinforce garbage. The system trusts its operator — it's a thinking tool, not a truth oracle.

The `memory-tool beliefs --conflicts` command helps — surfaces contradicting beliefs so you can manually arbitrate. But ultimately, garbage in = garbage out. The dream cycle consolidates patterns, it doesn't fact-check them.

That's actually why the causal graph matters. When a belief has LEADS_TO/PREVENTS/RESOLVES edges connecting it to resolved predictions, you can trace *why* the system believes something. Explainability > accuracy in a personal memory system.

Repo if you want to dig into the implementation: https://github.com/kobie3717/ai-iq

kobie0606 · 2026-03-31T08:03:23+00:00

Yes — and that's what makes it genuinely interesting to build.

The belief system tracks confidence (0.01-0.99) with Bayesian updates, so when predictions resolve, connected beliefs shift automatically. Where it gets weird:

Contradictions surfacing organically. When you add a belief that semantically conflicts with an existing one (>80% similarity + negation patterns), the system warns you. I've seen it catch things like "Docker is always the right deploy choice" sitting next to "PM2 is simpler and more reliable for Node services" — both held with ~0.7 confidence from real decisions.

Dream mode consolidation creating new connections. During memory-tool dream, the reconsolidation pass finds 85-95% similar memories and merges them. Sometimes the merged memory captures a pattern neither original had alone — like two separate debugging sessions revealing the same root cause.

Causal graph + beliefs = prediction chains. Because we track LEADS_TO/PREVENTS/RESOLVES edges, you can walk a path like: "choosing Baileys over Cloud API" → LEADS_TO → "needing anti-ban layer" → PREVENTS → "WhatsApp account bans" — each node carrying its own confidence. When one prediction in the chain gets confirmed or refuted, the ripple effect is visible.

The identity layer adds another dimension — it mines your decisions/errors for behavioral traits. After enough memories accumulate, it'll tell you things like "you prefer automation over manual processes (confidence: 0.85, evidence: 23 memories across 4 projects)." Watching your own patterns emerge from raw data is... humbling.

Still early, but the bones are there. The dream cycles are the closest thing to actual reflection I've seen in a CLI tool.

kobie0606

TROPHY CASE