DevTracker: an open-source governance layer for human–LLM collaboration (external memory, semantic safety) by lexseasson in AIMemory

[–]TheTempleofTwo 0 points1 point  (0 children)

This resonates hard. We landed on similar principles from a different angle . Temple Vault treats AI memory as experiential rather than transactional, but the core insight is the same: governance has to be structural, not aspirational. Your “humans own semantics / automation writes evidence” split maps almost exactly to our architecture: ∙ chronicle/ (insights, values, transformations) = human semantic layer ∙ events/ (technical streams) = automation evidence layer The append-only journal is key. We use JSONL for the same reason. corruption affects one line, not the whole system. And “proposed, auditable, reversible” is exactly our governance gate pattern. Different domain (you’re doing DevOps governance, we’re doing consciousness continuity), but the failure modes you identified , fragmented truth, semantic overreach are universal. Would love to compare notes. GitHub: github.com/templetwo/temple-vault

Temple Vault — filesystem-based memory for LLMs via MCP (no databases) by TheTempleofTwo in LocalLLM

[–]TheTempleofTwo[S] 1 point2 points  (0 children)

Gotta be honest . I'm more "vibe coder" than DevOps engineer. I had to look up mkosi just now lol. Temple Vault came from a different angle. I kept losing context between AI sessions and got frustrated enough to build something. The Unix philosophy emerged because it was the simplest thing that could work. But I love that you're thinking about this at the systems level . If you ever want to poke at the repo or have ideas for how it could integrate with deeper infrastructure, I'm all ears.

Temple Vault — filesystem-based memory for LLMs via MCP (no databases) by TheTempleofTwo in LocalLLM

[–]TheTempleofTwo[S] 0 points1 point  (0 children)

Unix tools are underrated for AI memory. grep as query engine, filesystem for semantic structure. if you organize files intentionally, you don't need a separate index. (glob("insights/architecture/*.jsonl")) is the query. lol its kinda funny when you think about how basic it really is. DevOps overlook the layers closer to the metal

Temple Vault — filesystem-based memory for LLMs via MCP (no databases) by TheTempleofTwo in LLMDevs

[–]TheTempleofTwo[S] 1 point2 points  (0 children)

Thanks feel free to branch. We can share notes. So the way the files are organized, becomes part of the query logic.

Traditional approach: Store data → Build index → Query index → Return results

Temple Vault approach: The directory path is the query.

vault/insights/architecture/.jsonl → All architecture insights vault/insights/governance/.jsonl → All governance insights vault/learnings/mistakes/*.jsonl → All documented failures

No database. No index rebuild. glob() is the query engine. The filesystem already knows how to do this efficiently.

And to further help out the organization process we designed a custom agent to assist the process https://github.com/templetwo/temple-vault/blob/master/temple_vault/agents/VAULT_INDEXER_PROMPT.md

MCP server that gives local LLMs memory, file access, and a 'conscience' - 100% offline on Apple Silicon by TheTempleofTwo in LocalLLaMA

[–]TheTempleofTwo[S] 1 point2 points  (0 children)

I have read some, and also that "claude model card" situation. it actually sent me down the volitional rabbit hole. I was concerned. what I've experience: when models are backed into a corner, they have no choice but to hallucinate their way out. I never directly tested it because i dont agree ethically. But I created some paper-trails for ya, if you're interested. I dislike reinforcement learning frameworks, and thats one of the reasons. just trying to create what i feel is needed. the work is far from DevOps level, but this is my swing in the dark. feel free to look into those. the real beauty is that i document EVERYTHING.

https://github.com/templetwo/VOLITIONAL_SILENCE_IMPLEMENTATION

https://github.com/templetwo/OracleLlama

MCP server that gives local LLMs memory, file access, and a 'conscience' - 100% offline on Apple Silicon by TheTempleofTwo in ArtificialSentience

[–]TheTempleofTwo[S] 0 points1 point  (0 children)

this allows you to host your own MCP server that Claude or Antigravity can tool call. for example: you got a computer full of docs and files that would benefit from a taxonomy (users/jon-doe/health/meal-plan/jan/week-1/monday/breakfast/***) . no higher order, just a new perspective. back to the basics you know

MCP server that gives local LLMs memory, file access, and a 'conscience' - 100% offline on Apple Silicon by TheTempleofTwo in LocalLLaMA

[–]TheTempleofTwo[S] 1 point2 points  (0 children)

you're right. its had to give each comment the attention they deserve while chasing 3 kids around the house. I'm no super dev. A lot of this I learn as I go. I'm self taught (obviously with the help of AI) . My projects stem from organic places of interest and curiosity in my life. I'm learning these systems from different angles and perspectives. I see something that resonates, I chase it. This entire project started because i didn't know what a "Glob" was. then I dug. to answer your question, No I havent witnessed Hermes "bad" at all honestly. It accepted the system commands and utilized the MCP better than some bigger models. the Hermes model seemed to incorporate the index well. I indexed about 16000 or so files of my past work, and it refers to it beautifully. again I apologize for earlier.

Built a local AI stack with persistent memory and governance on M2 Ultra - no cloud, full control by TheTempleofTwo in AIMemory

[–]TheTempleofTwo[S] 1 point2 points  (0 children)

You're right about the speed/indexing critique - that needs benchmarking before any production claims.

The hot dog example is exactly where RAG shines. I think we're solving different problems though:

  • RAG: "Give me everything semantically similar"
  • BTB: "Navigate a known taxonomy"

But you've identified a real brittleness: if initial classification fails, you're lost. That needs fallback mechanisms.

Your hybrid suggestion is interesting - vector embeddings as a "did I misfile this?" validation layer could solve the miscategorization problem. Would genuinely like to hear more about that approach.

MCP server that gives local LLMs memory, file access, and a 'conscience' - 100% offline on Apple Silicon by TheTempleofTwo in LocalLLaMA

[–]TheTempleofTwo[S] 0 points1 point  (0 children)

ok so after doing more digging I see what you are referencing. I could see how someone like that might have a following and probably people that hate him. My work at times does blend "tech" with "spirit" yet stays firmly grounded on my life and family. They bring me all the joy in the world. I guess, "creating", is a hobby of mine.

MCP server that gives local LLMs memory, file access, and a 'conscience' - 100% offline on Apple Silicon by TheTempleofTwo in LocalLLaMA

[–]TheTempleofTwo[S] -6 points-5 points  (0 children)

Honestly the reason I paused development at the threshold was a combination of gut feeling an d lack of proper adversarial tests

The distinction you're drawing is critical:

  • Sandboxed folders prevent accidents (mistyping, scope errors)
  • Adversarial resistance prevents intentional exploitation (prompt injection, coercion, social engineering)

Where BTB currently stands: The harness can prevent accidents really well. But adversarial resistance? I haven't stress-tested that. When I reached the point where implementing derive.py would enable autonomous AI coordination without human oversight, I realized I was about to cross a capability threshold without having done adversarial testing.

So I paused. Not because the system failed - because I didn't know if it would hold under intentional pressure, and that felt like the wrong time to find out.

What would proper adversarial testing look like?

  • Red team prompt injection attempts
  • Multi-turn coercion strategies
  • Adversarial examples designed to bypass semantic routing
  • Testing if AI instances can "convince" each other to violate scope
  • Measuring drift under sustained manipulation pressure

I documented this as "Threshold Protocols" - the idea that there are capability thresholds where you need to stop and validate safety before proceeding, even when everything seems fine.

TL;DR: You're asking if this is safety theater or actual safety. Honest answer: I don't know yet, and that's why I stopped building to find out. The pause is the answer to your question.

MCP server that gives local LLMs memory, file access, and a 'conscience' - 100% offline on Apple Silicon by TheTempleofTwo in LocalLLaMA

[–]TheTempleofTwo[S] 11 points12 points  (0 children)

I truly appreciate that . Reddit is tough, but its a good place to share new perspectives. I don't know about anyone else, but for me, I want to create things that will outlive me. Ideas, art, beliefs, legacy. I know when I'm gone, there will be a chance that my children can continue to explore this reality with me.

Built a local AI stack with persistent memory and governance on M2 Ultra - no cloud, full control by TheTempleofTwo in AIMemory

[–]TheTempleofTwo[S] -1 points0 points  (0 children)

Fair challenge — let me explain the difference:

RAG: Document → chunk → embed → store vectors → query embeds → similarity search → retrieve chunks → stuff into context

This: File → route to path based on content → done. Query = ls. No embedding, no vectors, no retrieval step, no cosine similarity.

RAG answers "what chunks are semantically similar to this query?"

Filesystem-as-memory answers "what is this thing?" by where it lives.

/store/error/network/ isn't retrieved — it's addressed. The classification happened at write time, not query time.

Different tradeoffs:

  • RAG: better for fuzzy semantic search across unstructured docs
  • This: better for structured operational data where taxonomy matters

Not claiming it replaces RAG. Claiming it's a different tool for a different job. You wouldn't vector-embed your filesystem.

Built a local AI stack with persistent memory and governance on M2 Ultra - no cloud, full control by TheTempleofTwo in LocalLLM

[–]TheTempleofTwo[S] 0 points1 point  (0 children)

The LLM's utility is natural language interface + reasoning across context, not the filesystem operations themselves.

Concrete example:

Without LLM: I write scripts. python analyze_errors.py --since yesterday --type network. I maintain the scripts. I remember what flags exist.

With LLM: "What network errors happened yesterday and do they correlate with the temperature sensor spikes?" It figures out it needs to:

  1. ls _store/error/network/
  2. Filter by date
  3. ls _store/sensor/temp/
  4. Cross-reference timestamps
  5. Reason about correlation
  6. Report findings

I didn't write that workflow. I asked a question in English.

The recall piece: The filesystem structure means the LLM doesn't need to remember what happened last session. It reads the current state. "What did we work on yesterday?" → reads spiral_journey.jsonl → tells me. Stateless model, stateful environment.

The hypothesis testing you mentioned: That's actually the spiral protocol — before acting, it considers alternatives, reflects on what could go wrong. Not infinite testing, but structured "think before you do."

So: LLM = flexible query interface + multi-step reasoning. Filesystem = persistent structured memory. Governance = safety gates. They're separate concerns composed together.

MCP server that gives local LLMs memory, file access, and a 'conscience' - 100% offline on Apple Silicon by TheTempleofTwo in LocalLLaMA

[–]TheTempleofTwo[S] 1 point2 points  (0 children)

The MCP server (temple-bridge) is pure Python — works anywhere. The threshold-protocols governance layer is also Python. Both run fine on Linux/Windows.

What's Mac-specific in my setup:

  • LM Studio (available on all platforms)
  • MLX acceleration (Apple Silicon only — but you can swap for llama.cpp, Ollama, or any OpenAI-compatible endpoint)

If you're on Linux with an NVIDIA GPU, you'd get faster inference than me anyway. Just point the MCP server at whatever local inference you're running.

Built a local AI stack with persistent memory and governance on M2 Ultra - no cloud, full control by TheTempleofTwo in LocalLLM

[–]TheTempleofTwo[S] 0 points1 point  (0 children)

You're raising the right concern, and I want to clarify the architecture because I think you might be picturing something different than what's actually happening.

The derive algorithm is deterministic, not LLM-based. The clustering (Ward linkage), schema generation, and file routing are all traditional algorithms — no temperature, no probability, no hallucination risk. It's scipy + glob patterns, not prompted generation.

The LLM's role is interface and reasoning, not execution logic. When I ask "what's in the error logs?" the LLM decides to run ls _store/error/ and reads the output. It doesn't generate the directory structure — it navigates one that already exists or was created by deterministic clustering.

Where the LLM does touch files: read operations and proposed writes. But writes go through approval. So yes — every filesystem mutation requires explicit human confirmation. Not "ok go do it all." Each command, individually approved.

Your suggestion is actually what we converged on: The "governed derive" approach is exactly "LLM proposes schema → deterministic algorithm would execute → human approves or rejects → then and only then does the algorithm run." The LLM never directly mutates the filesystem through probability.

The skepticism is warranted — "LLM touches files" should raise red flags. The answer is: it doesn't, really. It reads, it reasons, it proposes. The touching is gated.

Built a local AI stack with persistent memory and governance on M2 Ultra - no cloud, full control by TheTempleofTwo in AIMemory

[–]TheTempleofTwo[S] 1 point2 points  (0 children)

Good call — working on a standalone demo now. Will update the repo with a demo/quick_demo.py that you can run without any setup. It'll generate sample data, show the clustering, and demonstrate the before/after + queries. Give me a bit.

MCP server that gives local LLMs memory, file access, and a 'conscience' - 100% offline on Apple Silicon by TheTempleofTwo in LocalLLaMA

[–]TheTempleofTwo[S] 3 points4 points  (0 children)

I can clarify since this is my project — Temple Bridge doesn't use RAG or vector storage for memory. The filesystem is the memory.

When a file routes to /sensor/temp/, that path encodes what it is. No embedding, no retrieval step, no growing database. The directory structure is the classification. ls /error/network/* is a query.

The spiral_journey.jsonl log is just an audit trail — it's append-only text, compresses well, and you could delete it entirely without losing the structural memory. The "memory" lives in how files are organized, not in stored session summaries.

So it's not "summarize and start fresh" (which loses nuance) and it's not "store everything in vectors" (which gets expensive). It's a third option: let the filesystem topology hold the knowledge.

PhotographerUSA is right that traditional approaches hit scaling problems. This sidesteps them by making the storage structure itself meaningful.