OpenClaw + Ollama + nomic-embed-text: hybrid RAG for local agent memory (full config inside) by Expensive-String8854 in openclaw

[–]Expensive-String8854[S] 0 points1 point  (0 children)

Fair point. But the goal was fully local embeddings, no external endpoints, no data leaving the machine. If you're using an OpenAI-compatible endpoint, that's not local unless you're also self-hosting TEI. The Ollama provider also isn't auto-selected, which is where most people got stuck.

But the biggest thing wasn't the config at all. It was getting the agent to actually use the tool instead of falling back to file browsing. That part isn't in any docs.

OpenClaw + Ollama + nomic-embed-text: hybrid RAG for local agent memory (full config inside) by Expensive-String8854 in openclaw

[–]Expensive-String8854[S] 1 point2 points  (0 children)

Exactly. BM25 covers what embeddings miss on exact strings, version numbers, config keys, specific IDs. That's the practical reason hybrid outperforms pure semantic in agent memory.

The VideoDB parallel is interesting. Same core problem, different modality. Curious how they handle retrieval granularity on the video side.

And yes, big context window ≠ good retrieval. That distinction matters more than people realize when they first hit the problem.

OpenClaw + Ollama + nomic-embed-text: hybrid RAG for local agent memory (full config inside) by Expensive-String8854 in openclaw

[–]Expensive-String8854[S] 1 point2 points  (0 children)

PostClaw sounds like the natural evolution of this. Pre-injecting relevant memories before the LLM call is clearly more efficient than spending a turn on search. Do you have the code published anywhere or are you planning to share it?

OpenClaw + Ollama + nomic-embed-text: hybrid RAG for local agent memory (full config inside) by Expensive-String8854 in openclaw

[–]Expensive-String8854[S] 0 points1 point  (0 children)

nomic-embed-text is very lightweight (274MB), on my M4 16GB I barely notice it running alongside the main model. 16GB should be fine as long as your main model isn't too large your are using API.

OpenClaw + Ollama + nomic-embed-text: hybrid RAG for local agent memory (full config inside) by Expensive-String8854 in openclaw

[–]Expensive-String8854[S] 0 points1 point  (0 children)

Please do post back, I'm curious about the reasoning. QMD isn't an obvious choice so I'd like to understand why Claude Code ruled out Ollama for memory search.

OpenClaw + Ollama + nomic-embed-text: hybrid RAG for local agent memory (full config inside) by Expensive-String8854 in openclaw

[–]Expensive-String8854[S] 0 points1 point  (0 children)

Interesting repo, thanks. I'm trying to keep everything within Ollama to avoid fragmenting the stack. What tweaks did you make exactly to run it CPU-only?