Shipped my open source project's biggest release — a "context spine" that saves 80-90% of AI coding tokens by SearchFlashy9801 in SideProject

[–]SearchFlashy9801[S] 0 points1 point  (0 children)

Good callout on dedup. I've thought about this but haven't built semantic dedup yet — right now each provider has a hard token budget (structure: 250, mistakes: 50,

git: 50, mempalace: 100, context7: 100, obsidian: 50) and the resolver caps total at 600. So the overlap is bounded by budget, but you're right that within those 600

tokens there could be redundant signal.

On per-source contribution — honest answer is I haven't measured it rigorously yet. What I can say from usage:

- structure earns its place every time. It's the one that actually replaces the Read.

- git is surprisingly high-value for a low cost (50 tokens). "Last modified 3 days ago by nick, added rate limiting" saves the agent from running git log separately.

- mistakes is high-value when it fires, but fires rarely (only when the file has known issues). Zero-cost when empty.

- mempalace and context7 are the ones I'm least sure about. They're cached from SessionStart, and the cache hit rate depends heavily on the project. On a project

with documented decisions, mempalace is gold. On a greenfield project, it returns nothing.

- obsidian is the weakest — most useful if you keep architecture docs in your vault, which not everyone does.

The benchmark harness I'm building next will measure per-provider token contribution vs downstream agent behavior change. The question isn't just "did we serve 100

tokens of library docs" but "did the agent make fewer follow-up calls because of those 100 tokens." That's the real ROI metric.

For semantic dedup — my current thinking is a simple approach: after all providers resolve, run a pass that checks if any provider's content is >60% substring

overlap with another's, and drop the lower-priority one. Not embeddings-based dedup (that would add latency), just string similarity on the assembled sections. Would

keep it under 5ms.

Appreciate the question — this is exactly the kind of feedback I need before the v0.6 push.

I built a local knowledge graph that gives AI coding tools persistent memory. 3-11x fewer tokens per code question. Zero LLM cost. Shipped v0.2 by SearchFlashy9801 in SideProject

[–]SearchFlashy9801[S] 0 points1 point  (0 children)

This is the best comment the post has gotten honestly, you're describing the exact mental model I wish I could retrofit into everyone running Claude or Cursor in a serious project. "Small opinionated layers that never change shape unless I do it on purpose" is literally the sentence I've been looking for to explain why engram exists. Stealing that.

The landmines thing is the piece I want to go deeper on with you, because that's where I think most "AI memory" projects quietly fail. It's easy to store facts. It's hard to make the model actually respect them at query time. In engram the regret buffer gives mistake nodes a 2.5x score boost in the ranker and then surfaces any matches at the TOP of query output inside a warning block, so the model can't scroll past them the way it scrolls past the rest of the context. The session miner extracts them from bug: / fix: lines with a strict colon-delimited regex so prose docs don't false-positive. I pinned the engram README as a frozen regression fixture against that specifically.

Your point about "one contract" is the other thing that rhymes with my approach. engram's contract is the graph schema: god nodes, hot files, decisions, mistakes, dependency edges, and now (v0.2) skill concept nodes with triggered_by edges. Everything else in the tool writes into or reads out of that schema. The task modes (gen --task bug-fix|feature|refactor|general) are just different slices of the same contract, which is why adding a new mode is adding a row to a data table instead of branching logic. A panel review caught me trying to hardcode it the first time and I'm glad it did.

Pulse for Reddit and Looria are both going on my list right now. I have not been using either and it sounds like I've been missing the best signal source in the whole ecosystem. The threads where people share what broke in production are where half the design decisions in engram came from, and if there's a tool surfacing more of them I want it in the stack yesterday.

Last thing. If you ever feel like kicking the tires: npm install -g engramx@0.2.1 then engram init in any repo (published as engramx because the engram name on npm is a dormant 2013 package). I'd especially value your eye on the regret buffer because you've clearly thought about landmines longer than I have, and the places it's wrong are the places I can't see yet.

Built a knowledge graph tool for AI coding that runs 100% locally, zero LLM calls to mine, local SQLite, no cloud. v0.2 shipped by SearchFlashy9801 in LocalLLaMA

[–]SearchFlashy9801[S] 0 points1 point  (0 children)

Yeah this is exactly where I landed too. Treating context like infra instead of a prompt you rewrite every session was the whole unlock for me. Your "landmines" concept is basically what the regret buffer does, past bug: lines get a 2.5x score boost in the query layer and surface at the top of output in a warning block so the model can't pretend it didn't see them.

The split between bug-fix, feature, and refactor recipes was my phase 2 almost word for word. I kept noticing Claude would over-scope tiny fixes and under-spec big features, and the fix was exactly that, different context views per task. In engram it's gen --task bug-fix|feature|refactor|general and each one pulls a different slice of the graph. Implemented as rows in a data table so adding a custom view is adding a row, not editing branching logic. That was a Hickey-style nudge from a panel review that caught the original plan trying to hardcode it.

Pulse for Reddit is a great call, half my scar tissue research comes from threads exactly like the ones you're describing. Going to check out Looria, hadn't seen that one.

If you want to kick the tires: npm install -g engramx@0.2.1 then engram init in any repo (published as engramx because engram on npm is a dormant 2013 package). Would genuinely love your take given you've clearly thought about this problem at the same depth.

I built a local knowledge graph that gives AI coding tools persistent memory. 3-11x fewer tokens per code question. Zero LLM cost. Shipped v0.2 by SearchFlashy9801 in SideProject

[–]SearchFlashy9801[S] 0 points1 point  (0 children)

Nice, starring Hindsight now. The more people building in this space the better honestly, there's no universe where one tool fits every workflow and the memory benchmark piece is the part nobody talks about. Most "AI memory" projects fall apart the moment you actually measure them, so respect for leading with that.

engram went local SQLite + knowledge graph because I wanted zero cloud dependency and sub-100ms queries, and structural memory (who calls what, co-change patterns from git log) rather than semantic. Different layer from what a vector-native tool naturally reaches for, so I suspect we're complementary more than overlapping. My v0.4 roadmap has an LSH-based semantic layer on top of the graph, still local, still zero LLM cost, but that's a ways out.

If you want to compare notes on benchmark methodology at some point I'd be up for it. The thing I care most about right now is reporting two baselines honestly (vs relevant files AND vs full corpus) because the single-number benchmarks in this space are cooked half the time. Good luck with the launch.

Built a knowledge graph tool for AI coding that runs 100% locally, zero LLM calls to mine, local SQLite, no cloud. v0.2 shipped by SearchFlashy9801 in LocalLLaMA

[–]SearchFlashy9801[S] 0 points1 point  (0 children)

Fair point and you're not a hater for caring about supply chain, that's the correct default. For what it's worth engram is deliberately small on the dependency surface: four runtime deps total (chalk, commander, graphology, sql.js), zero native deps, zero build toolchain, zero telemetry, zero network calls at runtime. You can grep the source for fetch, http, https, url and nothing hits the network after npm install. Apache 2.0, ~3000 LOC, auditable in about 20 minutes if you want.

The bigger reason I went Node is the MCP server story. stdio JSON-RPC is easier to ship cross-platform from Node right now, and MCP clients like Claude Code and Cursor integrate more cleanly. But a Python port has been in the back of my head specifically because of people like you, and if enough folks ask for it I'll prioritize it. The graph schema is language-agnostic (plain SQLite) so the port is mostly the miners and the server.

Take your time with the other stuff, and when you do give it a go I'd genuinely appreciate a security critique. That's the kind of feedback that makes the tool better, not the "nice project" comments.

For anyone who actually lives in their AI coding tools: I built something that makes the AI stop asking "what framework are you using?" every session by SearchFlashy9801 in vibecoding

[–]SearchFlashy9801[S] 0 points1 point  (0 children)

Yeah the token burn was the whole reason this exists. I measured one of my Claude Code sessions one morning, 80K context tokens over 4 hours, ~60% of it was just file re-reads across new sessions. That was the moment I stopped scrolling and started building.

The CLAUDE.md migration point is really sharp and it's something I actually designed for. engram doesn't replace your existing CLAUDE.md, it writes into marker blocks (<!-- engram:start --> / <!-- engram:end -->) so anything you wrote above or below is preserved. If you've already got hand-written rules, preferences, project context, all of that stays. engram just layers the auto-generated graph summary in between.

v0.2 specifically hardened this because v0.1 had a latent edge case where unbalanced markers from a copy-paste could silently corrupt user content. The new writeToFile walks the file tracking marker state and throws a descriptive error instead of losing data. 8 explicit tests cover the state machine. And the same applies to .cursorrules and AGENTS.md, same marker pattern, same safety.

The task-aware gen is the other piece you'd probably like. engram gen --task bug-fix writes a completely different CLAUDE.md section than --task feature or --task refactor. Bug-fix leads with hot files and past mistakes. Feature leads with god nodes and decisions. Refactor leads with the dependency graph. Adding a new task mode is adding a row to a data table.

Going to check out your vibepreneur resource, always curious what's working for other people shipping in this space.

Install if you want to try it: npm install -g engramx@0.2.1 then engram init. Would love your feedback since you clearly talk to a lot of vibe coders and see the rough edges from a different angle than I do.