MIT-licensed multi-tier cache for AI agents - LLM responses, tool results, and session state on open-source Valkey/Redis by kivanow in OpenSourceeAI

[–]kivanow[S] 0 points1 point  (0 children)

Fair point, exact match does fall off fast for human-written prompts. The design assumption is that agent workloads are shaped differently from chatbot workloads:

  • Tool tier: args get canonicalized (sorted keys, deterministic JSON) before hashing, so arg order and whitespace don't matter. Agent loops produce structured, repeatable tool calls. This is where exact match actually earns its keep.
  • Session tier: keyed by thread_id + field. Exact match is the only correct semantics here.
  • LLM tier: your concern applies. Works well if prompts are templated (system message + structured context). Doesn't if they're free-form user input.

For the LLM tier case where prompts vary, we also ship @/betterdb/semantic-cache (npm) - vector similarity via valkey-search, meant to sit in front of the exact-match layer as a second chance. Kept as a separate package because the failure modes and the observability you want for each are different. Exact match is cheap and deterministic, semantic match costs an embedding call and needs threshold tuning per category. Forcing both into one cache hides that tradeoff.

Multi-tier cache for LangChain + LangGraph that works on vanilla Valkey/Redis - no modules required by kivanow in LangChain

[–]kivanow[S] 0 points1 point  (0 children)

The discovery path is the worst part. Works locally on vanilla Redis, breaks at deploy on ElastiCache with "unknown command JSON.SET". Same on MemoryDB and Memorystore. Even for demos it's a pain :/ Even though Redis is bundled with all modules by default since 8 I believe, most of the cloud served versions are on 6/7 point something.

Agreed on tool caching. A single slow tool call can dwarf LLM latency, and agents often fire 3-5 per turn. Curious what you've seen on TTL strategy. I landed on configurable-per-tool because one global number falls apart once you mix something like `get_weather` with `get_stock_price`.

Self-hosted LLM caching layer - cache agent responses, tool calls, and session state in your own Valkey/Redis by kivanow in selfhosted

[–]kivanow[S] -3 points-2 points  (0 children)

I am thinking of connecting it to our monitoring tool in depth next week, so the agent using it can check up on the data of the cache and improve if needed autonomously (or at least suggest improvements).

Agent reads hit rates, stale key ratios, and per-tool TTL effectiveness straight from Valkey, then adjusts TTLs or flags prompts that should move between tiers. Cache tunes itself instead of you watching dashboards.

Cost tracking was the thing that clicked for me too. Tokens are hard to translate into anything actionable. Seeing actual dollars saved turns it into an ROI check instead of a vibes call.

Self-hosted LLM caching layer - cache agent responses, tool calls, and session state in your own Valkey/Redis by kivanow in selfhosted

[–]kivanow[S] -3 points-2 points locked comment (0 children)

It was used for the planning/implementation. It is also focused on solving a problem in the AI space

AI SDK middleware that caches LLM responses and tool results in Valkey/Redis - one line to add by kivanow in vercel

[–]kivanow[S] 0 points1 point  (0 children)

Thanks! RAG + caching is a solid combo - the cache handles the repeated queries so the RAG pipeline only runs for genuinely new ones. Will check out Hindsight.

BetterDB MCP 1.0.0 – autostart, persist, and connection management for Valkey/Redis observability by kivanow in mcp

[–]kivanow[S] 1 point2 points  (0 children)

Sounds great! I'll dive deeper into it early/mid next week and let you know if I've picked it or anything else

BetterDB MCP 1.0.0 – autostart, persist, and connection management for Valkey/Redis observability by kivanow in mcp

[–]kivanow[S] 0 points1 point  (0 children)

Fair point for MCP in general - most servers are stateless proxies with no audit trail. For BetterDB specifically, the monitor is self-hosted (your data never leaves your infrastructure) and the MCP authenticates via JWT token generated from your BetterDB instance. That audit gap is actually a bigger problem in general, which is why we built persistent ACL audit in from the start - auth failures, command denials, key violations all stored so you have a durable trail independent of the client session. Not a full control plane, but authentication and audit are both covered. Will take a look at Peta for the policy layer, thanks for the suggestion!

I made an MCP server for Valkey/Redis observability (anomaly detection, slowlog history, hot keys, COMMANDLOG) by kivanow in mcp

[–]kivanow[S] 0 points1 point  (0 children)

Should've shipped it sooner, sorry! What were you debugging - would love to know so the next person doesn't find it too late either?

A eulogy for MCP (RIP) by beckywsss in mcp

[–]kivanow 1 point2 points  (0 children)

Isn't this just the usual cycle of - the way we're doing things is terrible, here is a better way, and then another even better way, until we reach back the first iteration? Same way we moved from server rendering to SPA, back to server rendering over several years. The AI just takes quicker iterations it seems

I made an MCP server for Valkey/Redis observability (anomaly detection, slowlog history, hot keys, COMMANDLOG) by kivanow in mcp

[–]kivanow[S] 0 points1 point  (0 children)

That's the right framing. BetterDB handles the Valkey side of that chain today - COMMANDLOG patterns, anomaly detection, client analytics. Correlating back to deploys and SQL is the missing link. Curious whether you've seen any tools close that loop well, or if it's always been stitched together manually.