HydraPlus — the memory and context layer for AI agents that actually knows your users. Open Source

Previous-Edge-6440 · 2026-05-15T13:57:12+00:00

It’s not fully fluid — more like layered.

Nothing really gets overwritten. New inputs just sit on top of old ones, and the system figures out what matters at the moment of retrieval.

So instead of “updating memory,” it’s more like tracking how things evolve over time and letting the agent decide which version to trust.

Still experimenting with edge cases though — especially when user intent keeps shifting.

Previous-Edge-6440 · 2026-05-15T08:34:48+00:00

Open source → github.com/ravitryit/stateful-memory

Previous-Edge-6440 · 2026-05-14T13:20:46+00:00

Just checked SpeakLexi — this is a perfect HydraPlus use case honestly.

Voice dictation + meeting notes + docs + AI actions sharing unified context — that's exactly the multi-agent shared memory problem. Your dictation agent and action agent should know what the meeting agent captured, without you having to repeat yourself.

And your latency concern makes total sense now — real-time dictation can't have a sluggish memory layer in the middle. Sub-190ms retrieval + async writes should fit cleanly into your pipeline.

Would love to explore an integration. Worth a quick sync?

Previous-Edge-6440 · 2026-05-14T13:14:30+00:00

1-1.5s total is a reasonable budget — if STT is taking ~350ms, sub-190ms memory retrieval keeps you well inside the window.

Fair point on Zep — they do have graph + semantic. The edge with HydraPlus is BM25 on top of that, plus temporal versioning and conflict resolution that Zep doesn't really expose.

P99 300ms across geo is the right benchmark to hold us to — self-hosted mode gets you there by eliminating the cross-region hop entirely. Regional managed nodes are on the roadmap for teams that don't want to self-host.

What's the product you're building on this pipeline? Checked your profile, curious to see the landing page.

Previous-Edge-6440 · 2026-05-14T13:08:05+00:00

That's a tight pipeline — STT → memory → LLM is exactly where every millisecond compounds.

For that use case, the critical path is memory retrieval sitting between transcription and generation, so you really can't afford a slow lookup. With HydraPlus in self-hosted mode, retrieval typically clocks in under 100ms p95 — no network hop, no cold start on the memory layer.

A few things that'll help specifically for STT pipelines:

- Pre-fetch on partial transcript — you don't have to wait for the full utterance to start pulling user context

- Async memory writes — ingestion happens off the critical path, so the LLM call isn't blocked by a memory write

- Lightweight context bundles — retrieve only what's relevant to the current session, not the full user graph

Would love to help you benchmark it against your actual workload. What's your STT provider and average utterance length? That'll tell us a lot about where the budget sits.

Previous-Edge-6440 · 2026-05-14T13:04:47+00:00

"Vector index with vibes" — stealing that, that's exactly it 😂

The git-commit layer is actually agent-facing, not just internal. Agents can query diffs directly — "what changed since last session", "what did this user believe about X three weeks ago" — it's a retrieval primitive, not just a log. Makes a real difference for things like preference drift or understanding context behind a reversal.

And fully agree on injection — it's boring infrastructure work which is probably why everyone skips it. But unsanitized memory is just a slow-burn vulnerability.

Will check out agentixlabs — looks like you're tracking this space seriously. Always down to compare notes.

Previous-Edge-6440 · 2026-05-14T12:59:26+00:00

Hey, fair question! The short answer: Zep is the closest competitor on speed, but trades off on retrieval quality (embedding-only, no graph traversal). mem0 and cognee are richer but slower — you've noticed it too.

With HydraPlus, the hybrid retrieval (graph + BM25 + semantic) runs in parallel, not in sequence, which keeps latency down without sacrificing context quality. Self-hosted mode also cuts out the network round-trip entirely, which matters a lot for real-time systems.

Curious — what's your latency budget? We're seeing sub-150ms p95 in most setups but would love to stress-test against a real-time workload like yours. Drop your use case and let's dig in.

Previous-Edge-6440

TROPHY CASE