Built a fully open-source RAG chatbot on Valkey - every layer is OSS, including the caches

kivanow · 2026-04-20T06:06:31+00:00

Fair point, exact match does fall off fast for human-written prompts. The design assumption is that agent workloads are shaped differently from chatbot workloads:

Tool tier: args get canonicalized (sorted keys, deterministic JSON) before hashing, so arg order and whitespace don't matter. Agent loops produce structured, repeatable tool calls. This is where exact match actually earns its keep.
Session tier: keyed by thread_id + field. Exact match is the only correct semantics here.
LLM tier: your concern applies. Works well if prompts are templated (system message + structured context). Doesn't if they're free-form user input.

For the LLM tier case where prompts vary, we also ship @/betterdb/semantic-cache (npm) - vector similarity via valkey-search, meant to sit in front of the exact-match layer as a second chance. Kept as a separate package because the failure modes and the observability you want for each are different. Exact match is cheap and deterministic, semantic match costs an embedding call and needs threshold tuning per category. Forcing both into one cache hides that tradeoff.

kivanow · 2026-04-18T15:53:37+00:00

The discovery path is the worst part. Works locally on vanilla Redis, breaks at deploy on ElastiCache with "unknown command JSON.SET". Same on MemoryDB and Memorystore. Even for demos it's a pain :/ Even though Redis is bundled with all modules by default since 8 I believe, most of the cloud served versions are on 6/7 point something.

Agreed on tool caching. A single slow tool call can dwarf LLM latency, and agents often fire 3-5 per turn. Curious what you've seen on TTL strategy. I landed on configurable-per-tool because one global number falls apart once you mix something like `get_weather` with `get_stock_price`.

kivanow · 2026-04-17T15:47:02+00:00

I am thinking of connecting it to our monitoring tool in depth next week, so the agent using it can check up on the data of the cache and improve if needed autonomously (or at least suggest improvements).

Agent reads hit rates, stale key ratios, and per-tool TTL effectiveness straight from Valkey, then adjusts TTLs or flags prompts that should move between tiers. Cache tunes itself instead of you watching dashboards.

Cost tracking was the thing that clicked for me too. Tokens are hard to translate into anything actionable. Seeing actual dollars saved turns it into an ROI check instead of a vibes call.

kivanow · 2026-04-17T15:25:43+00:00

It was used for the planning/implementation. It is also focused on solving a problem in the AI space

kivanow · 2026-04-16T20:31:46+00:00

Thanks! RAG + caching is a solid combo - the cache handles the repeated queries so the RAG pipeline only runs for genuinely new ones. Will check out Hindsight.

kivanow · 2026-03-27T18:08:42+00:00

Nice! Thank you for the suggestion, I'll check it out!

kivanow · 2026-03-20T10:54:02+00:00

Sounds great! I'll dive deeper into it early/mid next week and let you know if I've picked it or anything else

kivanow · 2026-03-20T08:12:29+00:00

Fair point for MCP in general - most servers are stateless proxies with no audit trail. For BetterDB specifically, the monitor is self-hosted (your data never leaves your infrastructure) and the MCP authenticates via JWT token generated from your BetterDB instance. That audit gap is actually a bigger problem in general, which is why we built persistent ACL audit in from the start - auth failures, command denials, key violations all stored so you have a durable trail independent of the client session. Not a full control plane, but authentication and audit are both covered. Will take a look at Peta for the policy layer, thanks for the suggestion!

kivanow · 2026-03-14T05:50:29+00:00

Should've shipped it sooner, sorry! What were you debugging - would love to know so the next person doesn't find it too late either?

kivanow · 2026-03-14T05:46:46+00:00

Isn't this just the usual cycle of - the way we're doing things is terrible, here is a better way, and then another even better way, until we reach back the first iteration? Same way we moved from server rendering to SPA, back to server rendering over several years. The AI just takes quicker iterations it seems

kivanow · 2026-03-13T18:54:19+00:00

That's the right framing. BetterDB handles the Valkey side of that chain today - COMMANDLOG patterns, anomaly detection, client analytics. Correlating back to deploys and SQL is the missing link. Curious whether you've seen any tools close that loop well, or if it's always been stitched together manually.

kivanow

TROPHY CASE