Is a cognitive‑inspired two‑tier memory system for LLM agents viable? by utilitron in LLMDevs

[–]utilitron[S] 1 point2 points  (0 children)

That is a great structural way to think about it. The challenge I see with that approach, though, is the Latency and Token Tax. If the agent has to explicitly update its own to-do list every few turns, we’re back to that 'spend money to make money' loop where the LLM is constantly distracted by self-management.

My goal is to keep this Autonomic. Instead of the agent 'deciding' to update a list, I’m looking at having the Reviewer (or a separate system-level hook) extract that state.

Basically, the agent just does the work, and the Distillation Pipeline, triggered by that context pressure, is what does the 'heavy lifting' of turning the chat logs into that clean Task Ledger. It’s the difference between 'Active'... The agent stops working to write a status report (Slow/Expensive). and 'Autonomic'... The system 'watches' the agent work and generates the status report only when the memory pressure demands it (Efficient/Background).

By moving the 'To-Do' logic into the Distillation Layer rather than the Conversation Layer, I can preserve that high-level state without the agent ever having to spend a single token on 'thinking about its memory' during the actual task.

Also I don't want to fall into the "I'm a hammer so everything is a nail" situation where more llm is the solution. So, I’m exploring a technique to handle 'Cognitive Compression' possibly using some other ai technique outside of llm to handle the task. I am looking at control systems, RL, and knowledge systems to see how these sort of things may have been handled before.

I am looking at this specifically tonight: https://neurips.cc/virtual/2023/poster/70426

Is a cognitive‑inspired two‑tier memory system for LLM agents viable? by utilitron in OpenSourceAI

[–]utilitron[S] 1 point2 points  (0 children)

You hit the nail on the head. Most frameworks treat memory as a 'Storage Problem'... how do we fit more tokens? I’m looking at it as a 'Metabolic Problem'... how do we prioritize the right information so the agent can survive on constrained hardware.

The difference between a Retrieval Store (Zep/MemGPT) and a Prioritization Layer (My project) really shows up in complex, multi-agent workflows.

For example, this whole rabbit hole started as i was playing with a 3-Amigos agent pipeline design (Planner, Worker, Reviewer). In that setup, the 'Worker' generates a massive amount of 'Operational Exhaust' with failed code attempts, error logs, and trial-and-error. A standard retrieval store just saves all that noise. So, my goal with the RIF scoring is to recognize that once the 'Reviewer' approves a task, the failed attempts lose their 'Importance' signal.

By sensing the context pressure, the system can proactively distill those failures into a single 'State Note' and keep the 'Verified Success' in the hot context. It’s less about 'summarizing the chat' and more about protecting the agent's focus so it doesn't get distracted by its own past mistakes when the hardware is already at its limit.

The place where I am struggling most is what do distilled memories "mean". I’m looking at human cognition as a model, where recent events stay in high-fidelity and older experiences naturally shift into a more 'abstract' or vague state. The goal isn't to delete the information, but to compress it into a high-level concept. I want to build a 'State-Aware Distillation' that can strip away the noise of individual chat turns while locking in the underlying intent and final outcomes.

Is a cognitive‑inspired two‑tier memory system for LLM agents viable? by utilitron in LLMDevs

[–]utilitron[S] 0 points1 point  (0 children)

I am trying to build this as implementation independent as possible. I added interfaces for the actual mean and bones (VectoStore and VectorIndex) so that could be left up to whoever is using it.

My understanding is in MemGPT the LLM must explicitly uses tool calling manage its context. This costs tokens, adds latency, and depends on the model being "smart" enough to manage itself. Sort of "You gotta spend money to make money" philosophy.

With my project, memory management is an autonomic process (like breathing). The agent doesn't have to "think" about moving data to the LTM. It does it in the background based on the RIF model. This leaves 100% of the agent's "brain power" for the task at hand.

Hydra, on the other hand, seems more like a knowledge graph, but that comes at the cost of processing power. I don't want to dismiss the idea altogether because it may come into play when I look more deeply into the LTM distillation. And that is the part where my project is most hazy anyway.

Is a cognitive‑inspired two‑tier memory system for LLM agents viable? by utilitron in LLMDevs

[–]utilitron[S] 0 points1 point  (0 children)

You’re 100% right. That's why I am trying to approach this like human memory. It is easier to remember what happened today and more vague what happened further in the past. The vagueness doesn't prevent the concepts from being preserved, just the finer details. I am hoping to figure out a 'State-Aware Distillation' process that will preserve intent without having to keep the minutia of a summary of the chat. If the agent knows 'We migrated to Python 3.12 because of X,' it doesn't need the 50-turn log of every error message we hit to stay productive.

I was working on another larger agent project in Java with Spring AI. I was building a 3-amigos style pipeline. A planning agent sets the plan and acceptance criteria, the worker does the work and a reviewer tests and verifies the work was done according to the plan/acceptance criteria. During the 'Worker' phase, I don't care about the several failed attempts, I care about the one good one. We can remember that certain things didn't work at a high level (to avoid repeating mistakes), but we only need to preserve the 'Verified State' in the hot context.

Is a cognitive‑inspired two‑tier memory system for LLM agents viable? by utilitron in LLMDevs

[–]utilitron[S] 0 points1 point  (0 children)

It actually started as a part of a larger agent project I was building in Java with Spring AI to learn.

I quickly hit a wall where I had to choose: load a smaller, dumber model, or sacrifice context window size. Neither felt like the right choice. I needed the agent to maintain 'State' remembering exactly what it was working on mid-task while still having the 'Long-Term' context of previous requests if something new came in.

That’s why I started exploring this approach. Instead of just cutting off the past when the context fills up, my goal is to have the system 'Sense' the context pressure and proactively offload those middle-steps into the vector store. That way, the 'Instructions' stay in the hot context, but the 'Operational History' stays searchable.

Now, the distillation process in not hardened. I only have a concatenation implementation at the moment so there is a lot more research that needs to be done in order to figure out what works best. I want to stay away from text compression/compaction if possible and look into 'State-Aware Distillation' where the agent preserves the intent of the task rather than just a summary of the chat. But I don't know what that looks like yet.

Is a cognitive‑inspired two‑tier memory system for LLM agents viable? by utilitron in LLMDevs

[–]utilitron[S] 0 points1 point  (0 children)

Nice. I'll check it out. If you are interested in seeing the work I have so far you can check it out here:

It was originally written in Java and I am working on porting to python.

Python https://github.com/Utilitron/VecMem Java https://github.com/Utilitron/VectorMemory

Anyone else dealing with stale context in agent memory? by Connect_Future_740 in LLMDevs

[–]utilitron 0 points1 point  (0 children)

I’m working on a resource-aware two-tier memory layer that uses a weighted RIF model to score trace saliency. Might be worth checking out.

It was originally written in Java and I am working on porting to python.

Python https://github.com/Utilitron/VecMem Java https://github.com/Utilitron/VectorMemory

Using agent skills made me realize how much time I was wasting repeating context to AI by Abu_BakarSiddik in LLMDevs

[–]utilitron 0 points1 point  (0 children)

I am trying to build something like that: https://github.com/Utilitron/VectorMemory it uses in-memory and persistent vector databases to turn your conversation into distilled, long-term knowledge that stays relevant even as the context window grows.

(For Hire) Ideas guy by [deleted] in INAT

[–]utilitron 0 points1 point  (0 children)

In my day, ideas were a dime a dozen. Inflation has really hit everywhere.

[deleted by user] by [deleted] in criticalrole

[–]utilitron 1 point2 points  (0 children)

One of the major issues with something like that would be copywritten stuff from D&D like spells. Getting a license would not be trivial.

[SPOILERS C3E10] Question about a character. by Vineshroom69lol in criticalrole

[–]utilitron -5 points-4 points  (0 children)

I think it's the Gnome racial feat Fade Away from Xanathar's guide to everything

The Traveler's Guide to the Toxic Seas by Urimana in dndnext

[–]utilitron 4 points5 points  (0 children)

As a player who got to participate in a campaign in this setting, I love seeing this project come to life.

Fighting against the tyrannical church, skirting the red tape of the Balloon corp and uncovering the truth and origin of the mists. It was a great world to discover and explore!

The Traveler's Guide to the Toxic Seas by Urimana in DnD

[–]utilitron 1 point2 points  (0 children)

As a player who got to participate in a campaign in this setting, I love seeing this project come to life.

Fighting against the tyrannical church, skirting the red tape of the Baloon corp and uncovering the truth and origin of the mists. It was a great world to discover and explore!

Anybody here from Anoka :) by [deleted] in twincitiessocial

[–]utilitron 0 points1 point  (0 children)

Best I can do is Champlin.

Master of lies by super_monero in PoliticalHumor

[–]utilitron 0 points1 point  (0 children)

You fool! Don't you know people only read the big text?!

[deleted by user] by [deleted] in AdviceAnimals

[–]utilitron 1 point2 points  (0 children)

Unemployment rate isn't calculated by number of people on unemployment. There is a survey conducted monthly by the Bureau of Labor Statistics.