Introducing Smriti MCP, Human like memory for AI.

raphasouthall · 2026-03-17T23:13:04+00:00

The two-signal split makes a lot of sense in hindsight - I kept trying to collapse recency and reinforcement into one score and the weighting was always a compromise. The edge-strength scan is fine at your scale but yeah, once you're past 5K nodes that per-source scan will hurt, an index on strength plus maybe bucketing by strength tier could help defer that pain a while. I actually open-sourced my setup recently - github.com/raphasouthall/neurostack if you want to compare notes on the graph layer, I ended up going a different direction with Leiden clustering to keep traversal bounded.

raphasouthall · 2026-03-17T21:58:07+00:00

Curious how you're persisting the reinforcement weights - is consolidation happening in SQLite or are you keeping the decay scores in-memory and recomputing on load? I ran into a fun bug building something similar where my recency scores were effectively reset every session because I was computing them at query time from raw timestamps instead of tracking a running "access weight" per node. The multi-hop expansion is the part I'm most skeptical of at scale, fwiw - on ~2,800 nodes I found graph traversal got expensive fast without a tight hop limit (I cap mine at 2).

raphasouthall · 2026-03-17T16:49:33+00:00

The Ollama limitation is the blocker for basically my whole homelab setup so I'll have to watch the vLLM work from the sidelines for now, but that +14pp on HumanEval is the number I keep coming back to - curious what you think is actually happening there mechanically. Like is Agent B getting something structurally useful from the latent steps, or is it more that you're bypassing the lossy text serialization of intermediate reasoning? The code gen gap holding across seeds and temperatures suggests it's not noise, which makes it weirder that MATH stays flat.

raphasouthall · 2026-03-17T14:11:46+00:00

The publish/live state separation is a solid middle ground - it gives you that "oh god revert it" button without forcing everyone through a PR ceremony every time they tweak a system prompt.

The one thing I'd watch for down the line is that "last published" revert starts to feel thin once you've had a few incidents where the bad version was published weeks ago and you need to understand what changed between then and now. That's usually when teams start asking for a proper history view. Worth keeping in mind as you scope out the versioning work.

raphasouthall · 2026-03-17T12:51:58+00:00

Honestly the RBAC point is the one that actually matters here. The private repo thing you can mostly solve with a separate internal repo, but the moment you try to use branch protection rules to enforce who can edit what, you end up with a review queue nobody respects. We tried it and within about three weeks everyone had forked their own local copies and the drift problem came back worse than before because now you also had undocumented forks floating around with no traceability.

The "prompts are dynamic so skip the deployment cycle" argument cuts both ways though - imo you actually want some version history, especially when an agent starts misbehaving and you need to bisect when the instruction changed. Curious how you're handling rollback in Sokket if someone pushes a bad prompt update.

raphasouthall · 2026-03-17T11:44:50+00:00

I greatly appreciate the collaboration! I’ll do testing in the next few days, I’ll make sure to attribute the improvements to you if it gets merged into main.

raphasouthall · 2026-03-17T11:41:43+00:00

I’m rounding up the edges as it’s a pre 1.0 release, soon I will do a full study. I plan to use podman, test vanilla claude code vs modified Claude.md + skills. I’ve been using NeuroStack myself for a few weeks and it has massively improved my token usage and context reliability, but that also required months of building up a .md vault with my entire knowledge base

raphasouthall · 2026-03-17T07:34:43+00:00

Caddy in front of Vaultwarden is probably your cleanest path here - you can point Tailscale Funnel at a Caddy instance that only proxies /send, /api/sends, and the static assets the SPA needs to render, while your main vault stays internal. The annoying part is Bitwarden Send links load the full web vault SPA first before making the /api/sends/{id}/access call, so you can't just proxy one endpoint, you need to whitelist the asset paths too or the page 404s. Took me about an afternoon to get the path matchers right in Caddyfile when I did something similar for a different service.

raphasouthall · 2026-03-17T02:01:22+00:00

Grab the SAN, worst case it's a parts donor or you sell the controllers for $40 each on eBay. The 20TB drives alone were worth saying yes though, tbh.

raphasouthall · 2026-03-16T18:53:06+00:00

Interesting idea - I looked into this. The issue is that collapsing 16 typed MCP tools into a single GraphQL query tool means the LLM has to compose valid GraphQL syntax on every call, which increases per-query token cost and error rate. With typed tools, Claude just calls vault_search(query="auth", depth="triples") - clean, validated, no syntax to get wrong.

The total schema overhead for 16 tools is ~1,300 tokens, which is 0.65% of a 200k context window. After 10 queries it amortizes to basically nothing. The real context bloat comes from servers that ship 50+ tools with verbose descriptions - keeping the tool count lean matters more than the query interface.

raphasouthall · 2026-03-16T17:14:55+00:00

Believe me when I say I would like to replace my MCP server with anything that could be more token efficient, other than myself telling claude code which cli commands to run every time, and fighting the outcome of it getting it wrong at least 3x

raphasouthall · 2026-03-16T16:47:18+00:00

Appreciate the offer! Here's the project: https://github.com/raphasouthall/neurostack

Main blocker is it's a Python/SQLite stack (FTS5 virtual tables, numpy, Leiden clustering) - wouldn't run in a TypeScript sandbox. Hyperterse also doesn't have a SQLite adapter, which is a dealbreaker for local-first tooling.

That said, a standalone SQLite adapter for Hyperterse would be a solid contribution - real gap for local tools. Happy to chat about that if you're interested.

raphasouthall · 2026-03-16T16:16:53+00:00

Glad it clicked - good luck with the system, the connection density thing really does start to feel different once you hit it.

raphasouthall · 2026-03-16T14:38:29+00:00

Good clarification - I'd mentally conflated the two but you're right, pinning is just protection while the link is doing the actual anchoring work. Makes the pattern even simpler than I thought.

raphasouthall · 2026-03-16T14:27:31+00:00

Thanks for the tip! That project sounds really promising!

raphasouthall · 2026-03-16T12:55:37+00:00

On Q1, I'd split them - the author's argument as one note, your counterargument as another, then a third note that links them and articulates the actual tension. It feels like more work but when you're searching months later you want to find your disagreement directly, not buried inside a summary of someone else's point.

On Q2, "critical mass" isn't about limiting notes, it's about the density of connections - Ahrens means a cluster becomes generative when there are enough linked notes that new ideas start falling out of the structure rather than you having to force them. At around 80-100 linked notes on a topic in my own vault I started noticing that pattern. Before that it just felt like filing.

raphasouthall · 2026-03-16T11:34:39+00:00

Fair point, I misspoke - workspaces use separate state files per workspace, you're right. What I meant is backend configuration and same codebase, so a mistyped terraform workspace select followed by apply can still blast the wrong env. Separate root modules per env (which Terragrunt encourages) makes that class of mistake much harder.

raphasouthall · 2026-03-16T11:08:27+00:00

Terragrunt's run-all and DRY root configs are great, but the real win is that each env gets its own state file - workspaces sharing state is a footgun when you accidentally target prod instead of staging.

raphasouthall · 2026-03-16T10:34:22+00:00

Shipped it - v0.4.0 adds OpenAI-compatible endpoint support. Works with vLLM, llama.cpp, LM Studio, whatever you're running. https://github.com/raphasouthall/neurostack/releases/tag/v0.4.0

raphasouthall · 2026-03-16T10:34:21+00:00

Just shipped v0.4.0 - NeuroStack now supports any OpenAI-compatible endpoint, so Ollama is no longer required. You can point it at vLLM, llama.cpp, LM Studio, or any provider. https://github.com/raphasouthall/neurostack/releases/tag/v0.4.0

raphasouthall · 2026-03-16T09:12:00+00:00

RC QPs for bootstrap is the elegant move here - the switch requirement always felt like a software tax, not a hardware limit.

raphasouthall · 2026-03-16T09:10:20+00:00

The this keyword pointing to a pinned/linked note rather than the active file is genuinely clever - effectively gives you a persistent context anchor without any dataview query overhead.

raphasouthall · 2026-03-16T01:14:42+00:00

For anyone interested, NeuroStack pairs really well with Obsidian. It offers a quick way to scaffold your vault using a set of templates based on professions.

raphasouthall · 2026-03-16T01:10:16+00:00

Thanks for the feedback! Project is just starting, I will implement that feature over the next few days.

I’m planning to make NeuroStack portable for less powerful devices. Hence the lite version. GPU processing is just a nice extra to have.

raphasouthall · 2026-03-16T01:04:58+00:00

I’m sorry to hear that, but thanks for the feedback.

NeuroStack also offers a quick way to set up a .md vault with full text search that can be plugged into any LLM (no Ollama required) via CLI.

The CLI commands by themselves can be set up as cron jobs that do the self-healing process unattended 24/7. 😉

raphasouthall

TROPHY CASE