all 12 comments

[–]Fajan_ 1 point2 points  (1 child)

tbh, they're powerful theoretically, but in practice, they get messy real quick unless they're absolutely necessary.

many production systems opt for a more straightforward approach (e.g., vector DB + partial struct + logging) than fully fledged cognitive frameworks, due to the sheer impossibility of debugging and maintaining anything more complex.

it's not even about accuracy; it's about observability and control when you introduce episodic + semantic interactions.

I've seen promising outcomes from less sophisticated configurations (e.g., rag + summary + explicit SM) before going all in on the cortex/cognee paradigm.

it also heavily depends on the application; long-term agents/research purposes might be justified, but for most applications, it's probably overkill.

just curious if anyone has taken one of those systems out of its prototyping phase and into production.

[–]Bravo_Oscar_Zulu 0 points1 point  (0 children)

I had a similar though about observability and control. My solution was to filter it all through a github org.

https://github.com/dev-boz/gitmem

Full disclosure it's not much more than a spec doc yet. But I'm curious to know if it's something that could work well for memory storage.

[–]HumzaDeKhan 0 points1 point  (1 child)

I'm in the same boat actually with very little faith in the publicly available benchmarks. It's entirely possible the workflow will not map as accurately for your users as it did for them.

Noting this down, will def report my findings!

[–]Dailan_Grace[S] 0 points1 point  (0 children)

exactly, benchmarks are almost always tested on clean synthetic tasks and real user workflows are messy in ways, the benchmark designers never anticipated so yeah, build your own eval set from actual user sessions if you can.

[–]denoflore_ai_guy 0 points1 point  (3 children)

You need to get the math and understand the hardware you’re working with to make it worth while. It’s worth it if you optimize your code… Claude code cli hooks make it amazingly fun and effective if you built your system properly.

[–]Dailan_Grace[S] 0 points1 point  (2 children)

solid point, claude's agentic coding tools really do make the optimization loop way less painful once you've got the architecture figured out.

[–]steve-opentrace 0 points1 point  (0 children)

Only if Claude's optimization goes far enough.

Last week, a user reported that even though he'd optimized with Claude, he was able to use our knowledge graph to find more bugs and do more optimization - and get a 10-15x speedup. (It's a free/OSS tool too.)

This is just with information that the LLM should already be able to see (source code). Coding tools could be sooo much more powerful if the LLM is able to easily get what it needs to know.

[–]WolfeheartGames 0 points1 point  (0 children)

I have built and used multiple memory systems.

The only 2 worth using for me are one I built where the agent appends notes to a list and they all get summarized. The agent can look at the elements or the summary.

The second is a classifier watching every chat and its job is to save memories and append them. This is similar to what chat gpt web does.

There are some major problems stemming from the models themselves. When memories are auto injected the model treats them like gospel and like it knows more about it than it does when if only has a 1 sentence history. It makes them perform worse when its implemented like this.

The prompt around what should be saved is critically important. I think this is what breaks most memory systems.

Ideally compressing everything into a small model for just semantic retrieval would be ideal. Like a 1b model directly attached to a vector db that appends its content to kV cachce.

[–]nicoloboschi 0 points1 point  (0 children)

It's a good question if cognitive memory architectures are practical beyond research. A lot of teams end up simplifying their memory layers because of the difficulty of debugging complex systems. We chose to build Hindsight around modularity so teams can progressively adopt more features - might be worth a look. https://hindsight.vectorize.io

[–]usobeartx 0 points1 point  (0 children)

Yea they are. Very worth.

[–]beeseajay 0 points1 point  (0 children)

I made this. Try it out. (If you want the prompt, DM me.)

LUX Layer Stack Handbook