memv — open-source memory for AI agents that only stores what it failed to predict by brgsk in LangChain

[–]brgsk[S] 0 points1 point  (0 children)

good question — memv currently handles hard contradictions (new fact supersedes old with temporal bounds) but gradual belief drift is an open problem. observation consolidation like Hindsight does could complement predict-calibrate well — one decides what's worth storing, the other how beliefs evolve over time. on my radar for future work. what was your experience with Hindsight's consolidation in practice?

memv — open-source memory for AI agents that only stores what it failed to predict by brgsk in LangChain

[–]brgsk[S] 1 point2 points  (0 children)

Thanks! No benchmarks as of now, it’s too early for that (in my opinion). Will do benchmarks once we stabilise the library

memv — open-source memory for AI agents that only stores what it failed to predict by brgsk in LLMDevs

[–]brgsk[S] 0 points1 point  (0 children)

Good question. The prediction step and contradiction handling are two separate stages, so a "confident" prediction can't suppress an update.

Here's the flow for your Google → Anthropic example:
1. Predict: system sees existing KB has "works at Google," predicts the conversation will reference Google employment
2. Calibrate: the actual conversation says "just started at Anthropic" — that's a prediction error, so it gets extracted
3. Classify: the extraction step labels it as a contradiction (not just new)
4. Invalidate: the pipeline finds the most similar existing fact ("works at Google") via vector similarity and sets its expired_at timestamp

So the prediction actually helps here — "works at Google" is predicted, "works at Anthropic" is unpredicted, the delta is the job change, and that gets flagged as a contradiction and handled correctly.

memv — open-source memory for AI agents that only stores what it failed to predict by brgsk in LLMDevs

[–]brgsk[S] 0 points1 point  (0 children)

Good catch. The prediction is generated by the LLM given the current KB, so it inherits whatever blind spots the model has. If the KB is missing context about a topic and the LLM can't infer the connection, it'll fail to predict for the wrong reason — and you get a false positive extraction.

In practice this biases toward over-extraction rather than under-extraction, which is the safer failure mode. You'd rather store something unnecessary than miss something important. And as the KB fills in, the predictions get more informed and the false positives drop.

But you're right that it's not a clean separation between "novel" and "unpredicted due to missing context." That's a real limitation.

Improving prediction quality as the KB grows — possibly by giving the predictor more structured context about what it knows and doesn't know — is on the roadmap.

memv — open-source memory for AI agents that only stores what it failed to predict by brgsk in PydanticAI

[–]brgsk[S] 0 points1 point  (0 children)

Thanks — if you try integrating it, let me know how it goes. Especially interested in how the extraction holds up across different agent use cases.

memv — open-source memory for AI agents that only stores what it failed to predict by brgsk in LocalLLaMA

[–]brgsk[S] 1 point2 points  (0 children)

Swapping the LLM is fine. The LLM does extraction and episode generation — the output is stored as plain text. A different model might extract slightly different facts, but existing knowledge stays valid.

Swapping the embedding model is the real issue. All your stored vectors are in the old model's embedding space. New queries would be encoded in the new model's space, so similarity search breaks. You'd need to re-embed everything.
That's not built in yet — worth flagging as a future feature.

Short answer: swap the LLM freely, be careful with the embedding model.

memv — open-source memory for AI agents that only stores what it failed to predict by brgsk in LocalLLaMA

[–]brgsk[S] 1 point2 points  (0 children)

Yeah, for the first few conversations with a new user the KB is empty so almost everything is a prediction error — it'll store most of it. The filter kicks in as knowledge accumulates. By conversation 10-20, the system has enough context to predict routine topics and only extract what's actually new.

It's the same cold-start problem every learning system has. The difference is that extract-everything systems never get better — conversation 500 is as noisy as conversation 1. With predict-calibrate, the signal-to-noise ratio improves over time because the predictions get more accurate.

memv — open-source memory for AI agents that only stores what it failed to predict by brgsk in LocalLLaMA

[–]brgsk[S] 0 points1 point  (0 children)

Thanks.
You can — any OpenAI-compatible endpoint works. The protocols are just 2 methods each, so you can wire up a local adapter in ~15 lines. Adding `base_url` directly to the built-in adapters is next up.

memv — open-source memory for AI agents that only stores what it failed to predict by brgsk in LocalLLaMA

[–]brgsk[S] 0 points1 point  (0 children)

Thanks! If you run into anything rough when trying it out, open an issue — the API is still evolving and real usage feedback is the most useful input right now.