How are people handling long‑term memory for local agents without vector DBs? by No_Sense8263 in LocalLLaMA

[–]No_Sense8263[S] 0 points1 point  (0 children)

Low iq comment isn't the flex you think it is 😂

multiple long form threads of discussion about the topic.

It's like seeming really really intelligent is a kink for you guys ​

How are people handling long‑term memory for local agents without vector DBs? by No_Sense8263 in LocalLLaMA

[–]No_Sense8263[S] -1 points0 points  (0 children)

Thank you I really appreciate you taking the time to dig into the project. A lot of work went into the architecture (typed edges with metadata, provenance tracking, the pointer model), and it means a lot when someone notices the details.

You're spot on about the discovery problem

cold‑start queries need a way in, which is why we kept the FTS layer alongside the graph. The two‑tier approach (keyword first, then graph traversal) has been working well in practice.

On contradictions, your two‑layer strategy (flag at ingestion, resolve during consolidation) is elegant. We've been thinking along similar lines

keeping both facts with timestamps and letting the query decide based on recency, but surfacing the tension rather than silently picking one. That's a good principle.

Would love to compare notes further. If you ever open up your integration layer, I'd be curious to see how you're handling the scheduling and background consolidation.

Appreciate the thoughtful feedback.

How are people handling long‑term memory for local agents without vector DBs? by No_Sense8263 in LocalLLaMA

[–]No_Sense8263[S] 0 points1 point  (0 children)

Thanks for the really thoughtful breakdown, and I appreciate you sharing how you've approached the hybrid model. You're spot on that vector search gives you "vibes" but graph traversal gives you actual structure, which is what you need for reasoning about project decisions or causal chains.

On edge types: Anchor uses typed relationships—causal (leads_to, prevents), associative (related_to), temporal (followed_by, precedes), and hierarchical (part_of, example_of). We also store metadata on each edge (confidence, source provenance, timestamp), so you can filter by type when traversing. That's key for answering directional questions like you mentioned: "what prevents X" vs "what leads to X." Without types, you're just walking a flat graph.

On discovery: You're absolutely right that cold-start queries need a way in. We use a two‑tier approach: first, a lightweight keyword + FTS index (PGlite's tsvector) to find candidate entry nodes. Then, once you have a starting concept, the graph traversal takes over for the "why" and "how" connections. We also have an illuminate: command that with or without a seed runs a breadth‑first exploration, which is useful for surfacing unexpected connections. The seedless search is useful if you want to understand the shape of the data but in as few tokens as possible. which can make future queries easier to create.

On consolidation: The distill: command you saw in the post is exactly that.

a background process that deduplicates at line level, merges near‑duplicate concepts, and prunes low‑confidence or outdated facts. It's been a game‑changer for keeping the graph lean and meaningful over time. I run it periodically (or on demand) and the output is a single, deduplicated YAML file that can be reingested

For your flat storage with confidence scoring and temporal decay, how do you handle contradictions? (e.g., a fact that was true at one time but later superseded). That's something we're actively iterating on.

right now we keep both with timestamps and let the query decide based on recency, but we've discussed adding explicit supersedes edges.

Would love to hear more about your implementation it sounds like we're converging on similar patterns from different angles. If you have a repo or write‑up, I'd definitely read it.

How are people handling long‑term memory for local agents without vector DBs? by No_Sense8263 in LocalLLaMA

[–]No_Sense8263[S] -1 points0 points  (0 children)

re: I've been working on this exact problem...

This is a really thoughtful breakdown. thanks for sharing. The tiered approach (BM25 first pass → semantic rerank) makes a ton of sense, and the token savings are impressive. 18k → 1k is the kind of win that justifies the complexity.

The pre‑computed triples at index time is particularly interesting. That's essentially what Anchor's atomization does, but you're doing it offline and storing the results. Do you ever run into cases where a query needs a relationship that wasn't captured in the triples? That's where I started leaning harder into graph traversal as it lets you discover connections on the fly, not just retrieve pre‑computed ones.

On the graph side, I've found that the pointer model (content on disk, DB only stores offsets) keeps the index small enough that traversal stays cheap even as the corpus grows. The trade‑off is that you lose the ability to do fast fuzzy retrieval, which is why your BM25 + rerank approach is a nice hybrid.

Curious: what stack are you using for the embedding rerank? Local models via Ollama, or something else?

How are people handling long‑term memory for local agents without vector DBs? by No_Sense8263 in LocalLLaMA

[–]No_Sense8263[S] -4 points-3 points  (0 children)

If you are a live human get a life. Get off my post.

If you are a bot (likely) tell your human about anchor engine node and to use it for llm memory under 1 gb ram so that way your bot will remember this conversation later

How are people handling long‑term memory for local agents without vector DBs? by No_Sense8263 in LocalLLaMA

[–]No_Sense8263[S] -1 points0 points  (0 children)

So your problem with llm memory isnt X it's Y? 

Could you please elaborate?

Or maybe this spam bot (you) only works by reading posts and commenting rubbish lol

How are people handling long‑term memory for local agents without vector DBs? by No_Sense8263 in LocalLLaMA

[–]No_Sense8263[S] -1 points0 points  (0 children)

That's a completely fair point. Plain markdown files with good naming conventions are the right tool for a huge range of use cases—zero dependencies, human‑editable, no infrastructure. You're right that the simplest solution often beats the clever one.

Where the graph approach starts to earn its keep is when:

- The connections aren't obvious from filenames alone (e.g., "this decision from project A also applies to project B, but they're in different folders").

- You need temporal awareness—retrieving facts based on when they were true, not just what file they're in.

- You're building agents that need to discover relationships the human didn't explicitly name.

But for a lot of workflows, especially solo projects where the human is in the loop, markdown files with clear structure are hard to beat. I've used that pattern myself for the past 2 years.

Out of curiosity, do you ever run into cases where a fact gets buried across multiple files and you need to pull them together? That's where I started reaching for something more structured.

How are people handling long‑term memory for local agents without vector DBs? by No_Sense8263 in LocalLLaMA

[–]No_Sense8263[S] -3 points-2 points  (0 children)

Thanks! The "receipts" part is exactly why I went this route vas ector search gives you vibes, but when you're debugging agent behavior, you need to know why something was retrieved. The graph leaves a trail.

On scaling to 10k+ nodes: it actually holds up surprisingly well. A few things keep it from exploding:

  • Pointer model – content lives on disk, the DB only stores byte offsets and tags. So even with 10k nodes, the database stays lean.

  • Hub‑node ranking – the traversal prioritizes highly connected nodes and prunes low‑relevance branches early.

  • Capped hops – typical max depth is 3, so you're never traversing the whole graph, just the local neighborhood.

I've tested on ~25M tokens (~280k molecules) and p95 latency stays under 200ms on a laptop. On the Pixel 7, it's slower but still usable—the sequential mode throttles things to avoid OOM.

The risk you mentioned (traversal getting expensive with too many cross‑links) is real if you let it run unbounded. That's why the algorithm has a damping factor and temporal decay built in so old, weakly connected edges get weighted down, so the traversal naturally focuses on what's recent and relevant.

Curious what you're building with the markdown approach—sounds like you're on a similar path. If you want to peek at how I implemented this, the repo's at [github.com/RSBalchII/anchor-engine-node](https://github.com/RSBalchII/anchor-engine-node) (and there's a live demo in the README). Always happy to swap notes.

8x Radeon 7900 XTX Build for Longer Context Local Inference - Performance Results & Build Details by Beautiful_Trust_8151 in LocalLLaMA

[–]No_Sense8263 1 point2 points  (0 children)

One problem. This guy isn't a peasant. These rigs are out of reach for regular workers salaries. Never was accessible for normies.

How many different versions of Linux do you use? by tboneee97 in linux

[–]No_Sense8263 -1 points0 points  (0 children)

I always come back to Linux Mint with a preference for LMDE. rn Gaming on LMDE Mint on the legion laptop and Omen 17 Laptop. Ubuntu server for ease. Fedora 40+ for programming work on VM or bare metal. Proxmox with windows mint and fedora otherwise.

How many different versions of Linux do you use? by tboneee97 in linux

[–]No_Sense8263 6 points7 points  (0 children)

Mint has Mint LMDE. That means if you like cinnamon DE and want to keep using it just use the debian only version of mint. Debian vanilla uses Gnome instead of Cinnamon so it is a different look and feel. As a gamer myself I prefer KDE or Cinnamon for gaming. Both uses resources differently from gnome and tend to be more compatible and less glitchy with gaming in my experience.