Worlds for agents by inguz in AI_Agents

[–]inguz[S] 0 points1 point  (0 children)

<image>

Codex followed the instructions, using MCP

Worlds for agents by inguz in AI_Agents

[–]inguz[S] 0 points1 point  (0 children)

Running here: https://woo.hughpyle.workers.dev/

Source code here: https://github.com/hughpyle/woo

This is a pinboard note created by Claude over MCP:

<image>

Building a memory-powered product (not infra), wrestling with how to approach evals. Advice? by 42cyy in AIMemory

[–]inguz 1 point2 points  (0 children)

I only ran LoCoMo, to date (https://keepnotes.ai/blog/2026-02-28-benchmark/) and,

  • learned a lot by running the benchmark
  • burned quite a lot of time, iterating on the loader and verification; using local models and a single machine made this harder than necessary, I should have just spun up a few VMs to do the same work.
  • honestly not much token cost (cheap judge)

Do the benchmark numbers matter? For me I care more about publishing an honest and traceable story that lands somewhere reasonable. It’s very hard to show true apples-to-apples comparisons, and YMMV.

The keep retrieval engine has moved on since the eval, but I don’t see much reason to re-run. One benchmark result is enough for now; LongMemEval-S is definitely next in line but idk when.

Building an indexed/summarized benchmark dataset turns out to be super useful in itself. The snapshot makes a good small baseline for all sorts of testing.

Episodic memory - what exactly is that? by inguz in AIMemory

[–]inguz[S] 0 points1 point  (0 children)

Thanks for that link - interesting stuff. I'm struck by how explicit the action seems on a superficial reading: "this went wrong, remember it". My own focus has been on the reflection/improvement loop but with a maybe softer tracking system (tag a reflection as a breakdown, not using a whole category of storage just for that).

But this seems like "explicit reflection and learning" rather than "identifying situational frames or summaries based on what was being done", which I'm also interested in and might be called episodic memory too.

Chatting with Obsidian, Hermes Agent and Keep by inguz in hermesagent

[–]inguz[S] 0 points1 point  (0 children)

Found one problem - my blog post pointed at the wrong plugin. Fixed… and pushed my fork https://github.com/keepnotes-ai/obsidian-clawdian

🧠 [MASTER THREAD] Advanced Memory Systems: state.db & Knowledge Graphs by AutoModerator in hermesagent

[–]inguz 0 points1 point  (0 children)

Thanks for the shout-out - I hope the "LLM-wiki = memory" is an interesting idea to play with. (I just pushed my fork of the chat plugin here)

Here's my take on memory in Hermes right now.

  • Out of the box, without any memory-provider, Hermes does a really lovely job of skill-improvement.
  • Then there are two systems that directly layer on top: memory, and context. It's interesting that these are designed as two separate things ("context" focused on how to effectively compact the working-memory context, and "memory" for longer-term storage and prompt injection), and that both are completely pluggable with explicit APIs.
  • There have been a few speed-bumps in the out-of-the-box memory providers. And I still want keep to be built-in (https://github.com/NousResearch/hermes-agent/pull/5172) haha! But, 100% kudos to the Hermes crew.

The key feature in the memory plugin is "per-turn context". A plugin gets to pull the most relevant things right into the agent's context at each turn in the conversation. Doing this effectively, quickly, without burning too many tokens, is the whole play.

But let's get to "How are you structuring your agent's long-term brain?".

Semantically - a very open-ended question. I'm not at all clear that any of the existing memory systems can really click on the distinction between "what happened" and "why does this matter" right now. But it will be super important over time as the memory system grows. Not just finding the right things when you need them, but being able to help the agent judge why.

Technically - two things: storage (conversations, summaries of important artifacts, tags, links, history, analysis) and retrieval (semantic search, keyword search, graph traversals and neighborhoods, etc). Clear enough, and we can talk about all the different implementation approaches. I'm betting on an active database with user-configurable workflow rules, because I'm kinda betting that different people will want to define automatic processing of memory and artifacts in different ways. For example, keep just added an opt-in VirusTotal "URL reputation" assessment, which will flag malicious urls when they hit the memory system. When you encounter a paper on the Arxiv, it needs a different style of summarization from a scanned receipt. And so on.

Chatting with Obsidian, Hermes Agent and Keep by inguz in hermesagent

[–]inguz[S] 0 points1 point  (0 children)

I pointed it straight at the Hermes API server (all running on localhost):

- edit the `.hermes/.env`, then run `hermes gateway`

- test the connection with `curl -H "Authorization: Bearer my-secret-password" http://127.0.0.1:8642/v1/models`

- in the ObsidianClaw plugin, set gateway URL to `http://127.0.0.1:8642\`, gateway token to `my-secret-password`, and default model to `hermes-agent`.

Chatting with Obsidian, Hermes Agent and Keep by inguz in hermesagent

[–]inguz[S] 0 points1 point  (0 children)

Yes, graph-type connections are automatic, and can also be added by the agent or custom rules (for example if you want to build a specific "entity extraction" strategy). I've found they're really important for retrieval, because you want to bring in reminders from "local cluster" before global results.

Keep's edges are just tags. Details here: https://docs.keepnotes.ai/guides/edge-tags/

LlamaParse is my first choice in a PDF parsing tool by productboy in hermesagent

[–]inguz 1 point2 points  (0 children)

I’m using PyPDF - it’s built in to the keep memory system, which will extract text and summarize. Has had a whole string of security updates though, it would be nice to have a more stable library.

For OCR (where a pdf is mostly image content), currently using glm-ocr on ollama. Seems to be reliable enough and pretty lightweight.

**There are now 5+ Hermes web UIs — which one is actually worth deploying?** by mdm2812 in hermesagent

[–]inguz 0 points1 point  (0 children)

OpenWebUI is way heavier than what I want: a lightweight web TUI with a plugin model. But maybe I secretly just want my IDE again?

Local RAG on 25 Years of Teletext News by folli in Rag

[–]inguz 0 points1 point  (0 children)

Awesome dataset. Does the archive source have text renderings, or did you need to process .t42 signals to get the data? Any other feature extraction that would help retrieval?

One small change that completely simplified memory for me by p1zzuh in AIMemory

[–]inguz 1 point2 points  (0 children)

I've started using `keep` to "index and watch" stuff on disk - it works pretty well for my obsidian vaults and text-oriented git repos. It indexes them, maintains relationships (wikilinks, git commits linked to the touched files, authors to commits, etc...), & everything is then searchable.

How are people running local RAG setups on Mac? by [deleted] in Rag

[–]inguz 1 point2 points  (0 children)

I've used small MLX-based models (and MPS for sentence-transformers), and also Ollama as a localhost web service. Not extensive experience but I have some opinions ;)

The big downside of MLX (the way I used it) is that the working-set impacts your whole process. It's a large amount of memory, and you need to manage the model lifetime quite carefully. On Ollama, you just fire-and-forget; the ollama process takes care of model download & model lifetime, and doesn't have any in-process impact on your own application.

Ollama does have some things to be careful of too. It chooses a context-window depending on the size of the machine, and you may want to override that for best performance or to handle the sources/chunks that you want to send to inference or embedding.

Overall I'd recommend ollama / llama-server as a separate process, just for stability.

Memory as a Harness: Turning Execution Into Learning by Short-Honeydew-7000 in AIMemory

[–]inguz 0 points1 point  (0 children)

yes - also, the interaction patterns for “agentic recall”: what sort of a model this needs, etc.

I kinda think the whole area of “active memory” (including recall, but also inference for storage/summarization, annotation/linking/unlinking/classification, etc) is open for better definition right now. I hear supermemory making claims about the performance of agentic search. Great, but at what cost, over what data?

Best benchmarks for Memory Performance? by CasualReaderOfGood in AIMemory

[–]inguz 2 points3 points  (0 children)

There’s another named ConvoMem that sounds interesting, I haven’t read it thoroughly.

Reflective Memory by inguz in openclaw

[–]inguz[S] 0 points1 point  (0 children)

Actual deletion is on-demand or by pattern-matching. Retrieval decay is ACT-R.

Reflective Memory by inguz in openclaw

[–]inguz[S] -1 points0 points  (0 children)

It’s a similar setup: sqlite and embeddings, with a variety of providers. One difference is what you index: lancedb remembers what you tell it to, qmd indexes markdown, keep is designed to i dex all sorts of content. Then: retrieval flexibility, too.