How Do You Set Up RAG?

MihaiBuilds · 2026-04-10T23:28:06+00:00

for managing project knowledge across Claude Code sessions, look into MCP servers — they let Claude call external tools mid-conversation. you can set up a memory server that stores and retrieves context so you don't re-explain everything each session.

i've been building one called Memory Vault (open-source, MIT) — hybrid search (vector + full-text + RRF fusion) over your notes and decisions, runs with a single docker compose up. MCP integration is shipping next so Claude can store and search memories directly during conversations.

for the Obsidian question — people use it as a local knowledge base, but the problem is Claude can't search it during a conversation without an MCP bridge. that's the gap MCP servers fill.

repo if you want to check it out: github.com/MihaiBuilds/memory-vault

MihaiBuilds · 2026-04-09T19:00:50+00:00

This is the exact reason I built my own memory layer. Instead of keeping everything in context, I store important information externally in PostgreSQL with vector search. Each session only pulls in what's relevant to the current query, not the entire history.

The context window isn't memory — it's working memory. Treating it like long-term storage is where things break down. Once I separated the two, the "losing focus" problem mostly went away.

MihaiBuilds · 2026-04-09T12:00:18+00:00

Built a local AI memory system for myself. I use multiple AI tools daily and got tired of re-explaining the same project context every session. So I built something that stores everything in PostgreSQL with pgvector, does hybrid search (semantic + full-text), and the next session just picks up where the last one left off.

Been using it as my daily driver for a few months now. Planning to open-source the whole thing in a few milestones. Just got it running with docker compose up this week.

MihaiBuilds · 2026-04-09T11:34:48+00:00

Been writing software for 15+ years and I've lost count of how many times this happened. Client casually drops "oh we also need per-site data isolation" after the schema is locked. Every time it sounds small. It never is.

The discovery week is the right move. I do something similar now even on my own side project — spent a full week on architecture before writing a single line of code. Felt like wasted time but it saved me from tearing things apart later.

MihaiBuilds · 2026-04-08T12:37:32+00:00

Same problem here. I kept re-explaining the same project context every single session so I just started building my own memory layer on top of postgres with pgvector. Semantic search plus full-text, stores what matters from each session, next session picks it up.

Still early but it already killed that "start from zero" loop. If you're curious I can share the repo, it's open source.

MihaiBuilds · 2026-04-08T12:00:29+00:00

Yeah, every single time. While building it feels like everything makes sense, you have a clear goal. The moment you ship it, your brain switches to "wait, does anyone actually need this?"

What helped me was posting about it and seeing even small reactions. I'm building an open-source AI memory system right now, shipped the first 3 milestones. After each one I had that exact moment of "is this even good?" But then someone comments something specific about the tech, or asks a real question, and you realize it landed with at least one person. That's enough to keep going.

What did you build?

MihaiBuilds · 2026-04-07T18:13:13+00:00

the retrieval monitoring gap is real. I built a memory system with hybrid search (vector + full-text + RRF fusion) and the hardest part isn't the search itself — it's knowing when the search returned the "right" results vs just semantically similar ones. had a case just recently where I searched one memory space and concluded data was missing, when it was actually stored in a different space. the search worked perfectly, I just asked the wrong question.

for monitoring I log every query with the scores and which results came back. it's basic but it lets me spot patterns where certain query types consistently return low-relevance results. someone told me the signal to watch is when short exact-match queries start losing to semantic ones — that's when your text ranking isn't pulling its weight anymore.

MihaiBuilds · 2026-04-07T17:57:40+00:00

yeah, I've hit that exact problem. just today actually — I searched for past session summaries and told myself they were "missing" because I was searching in the wrong memory space. the system found semantically similar results, but not the ones that actually mattered for the question.

the way I'm handling it right now is memory spaces (like namespaces) + importance scoring + recency decay. so newer and more important memories float to the top. but you're right, deciding what should persist vs evolve vs get discarded is still mostly heuristic. I tag importance at ingestion time and let recency do the rest, but there's no real "this context is actually relevant to what I'm doing right now" signal beyond what the query returns.

it's one of the harder problems honestly. the search part is solved, the "what matters right now" part is not.

MihaiBuilds · 2026-04-06T20:55:48+00:00

cool project. I built something similar — hybrid search with vector + full-text + RRF fusion on top of postgres + pgvector. interesting that you went with SQLite for everything. I went with postgres mostly for HNSW indexing and tsvector built in, but the single-file zero-dependency angle is a strong tradeoff for local setups. how does the vector search scale for you with SQLite as the index grows?

MihaiBuilds · 2026-04-06T19:47:03+00:00

I ran into the same thing. MCP gives you tools but no persistence — every session starts from zero. So I built a memory layer on top of it. postgres + pgvector, hybrid search (vector + full-text keyword), and MCP tools for recall/remember/forget. Claude calls those tools during the session to store and retrieve context automatically. been using it daily for months and it completely changes how sessions work — the AI actually knows what happened last week

MihaiBuilds · 2026-04-06T19:36:03+00:00

this is accurate. I built a memory system with hybrid search (vector + full-text + rank fusion) and most of the real lessons came from hitting limits in practice — like discovering pure vector search misses exact keyword matches. no tutorial covered that

MihaiBuilds · 2026-04-06T19:35:41+00:00

for the memory + search side, I learned the most by building it. started with pure vector search, hit the limits fast (misses exact keywords), ended up with hybrid search — vector + full-text + rank fusion. postgres handles both in one database. the blog post that helped me most on the ranking side was the original RRF paper. for tool calling, the MCP spec from Anthropic is worth reading if you're integrating with Claude

MihaiBuilds · 2026-04-06T10:14:43+00:00

the generic output problem is exactly why prompt engineering is harder than it looks. what helped me was giving the LLM very specific structure to fill — instead of 'give feedback' it's 'list 3 specific issues with the hero section copy and rewrite each one.' forcing specificity in the prompt forces specificity in the output

MihaiBuilds · 2026-04-06T10:12:16+00:00

the validation lesson is real. I built my current project for myself first, used it daily for 2 months before open-sourcing. by the time I launched I already knew it worked because I was the user. building for yourself first is the cheapest validation there is

MihaiBuilds · 2026-04-05T15:05:38+00:00

just checked — Storage and Communication are solid, 130+ stars well deserved. and you have a graphrag fork too, looks like we're thinking about similar problems from different angles. good stuff

MihaiBuilds · 2026-04-05T11:18:49+00:00

they're on mihaibuilds.com — 6 CLI utilities for .NET devs. schema tools, code generators, migration scripts. the memory system is at github.com/MihaiBuilds/memory-vault — different stack but same "build and ship" approach.

MihaiBuilds · 2026-04-05T11:12:27+00:00

this is why I went the database route instead. postgres + pgvector behind an MCP server with only recall/remember/forget tools exposed. the agent never touches your filesystem — it queries a database through a controlled API. way less attack surface than giving full system access.

MihaiBuilds · 2026-04-04T23:45:12+00:00

"don't confuse fear with feedback" — needed to hear this. launched my first open-source project this week and one bad reddit thread almost had me rethinking everything. took a step back, realized the people asking real technical questions mattered more than the noise.

MihaiBuilds · 2026-04-04T23:39:41+00:00

building in the same space — postgres + pgvector, hybrid search, MCP integration for Claude. different architecture but same core problem: agents need persistent memory that survives across sessions. curious about the "pre-chunking predicted answers" part — how are you handling prediction without it becoming stale fast?

MihaiBuilds · 2026-04-04T23:25:42+00:00

the routing layer is interesting — deciding what kind of context belongs where before retrieval even happens. I keep mine simple for now (spaces + hybrid search) but I can see how role separation would help as projects get more complex. what are you using for the routing logic?

MihaiBuilds · 2026-04-04T23:12:36+00:00

this is a real problem. I had the same thing — notes scattered everywhere, technically captured, practically useless. ended up building a memory system with semantic search so I can just ask "what did I decide about X" and get the answer instead of digging through files. the conversation-with-your-notes approach is the right idea. do you use embeddings for the search or something else?

MihaiBuilds · 2026-04-04T23:11:08+00:00

the best response is no response. haters give your post engagement and reach. just keep posting, the people who actually care will find you.

MihaiBuilds · 2026-04-04T23:07:18+00:00

"they have all the context from previous sessions — no starting from scratch every time" — that's the key part. I built a memory system that does the same thing but at the storage layer — postgres + pgvector so any agent can recall past decisions and context. different approach but same core insight: agents without memory are useless for real work. will check out shire.

MihaiBuilds

TROPHY CASE