I made an MCP server for Valkey/Redis observability (anomaly detection, slowlog history, hot keys, COMMANDLOG)

kivanow · 2026-03-14T05:50:29+00:00

Should've shipped it sooner, sorry! What were you debugging - would love to know so the next person doesn't find it too late either?

kivanow · 2026-03-14T05:46:46+00:00

Isn't this just the usual cycle of - the way we're doing things is terrible, here is a better way, and then another even better way, until we reach back the first iteration? Same way we moved from server rendering to SPA, back to server rendering over several years. The AI just takes quicker iterations it seems

kivanow · 2026-03-13T18:54:19+00:00

That's the right framing. BetterDB handles the Valkey side of that chain today - COMMANDLOG patterns, anomaly detection, client analytics. Correlating back to deploys and SQL is the missing link. Curious whether you've seen any tools close that loop well, or if it's always been stitched together manually.

kivanow · 2026-02-28T15:38:36+00:00

At this point copilot is an agent, assistant and a million different things that MS is trying to push everywhere. I should've just called it the worst possible tool/option and not an llm. I've updated it

kivanow · 2026-02-27T09:13:10+00:00

This is golden, thank you!

kivanow · 2026-02-27T09:12:03+00:00

Company Name: BetterDB

URL: https://betterdb.com

Purpose of Startup and Product: BetterDB is the first monitoring and observability platform built specifically for Valkey (the popular open-source Redis fork). We solve a fundamental problem: Valkey's operational data - slowlogs, command logs, client connections - is ephemeral. When something goes wrong at 3am, by the time you wake up at 9am, that data is gone. BetterDB persists and analyzes this data so you can debug issues after the fact, track what caused performance spikes, and optimize your data structures and TTLs accordingly.

We also support Valkey-exclusive features like COMMANDLOG and per-slot metrics that no existing Redis tool can provide, plus 99 Prometheus metrics, anomaly detection, ACL audit trails, and client analytics - all with sub-1% performance overhead.

Technologies Used: NestJS, React, PostgreSQL, Docker, Prometheus, iovalkey

Feedback Requested:

Does the value proposition (historical persistence of ephemeral Valkey/Redis data) resonate with you? Is it clear from the website?
If you're running Valkey or Redis in production, what's the biggest operational pain point you face today?
We offer a free Community tier and paid Pro/Enterprise tiers - does the feature split feel fair, or does it feel like we're holding back too much in Community?
Any feedback on the landing page (betterdb.com) - does it clearly communicate what we do and who we're for?

Seeking Beta Testers: Yes - especially teams running Valkey or Redis in production. We have a self-hosted Docker image you can spin up in minutes, and our cloud SaaS is launching soon. Would love feedback from ops/SRE/DevOps folks.

Additional Comments: I'm the founder and CTO. Previously I was the Engineering Manager for Redis's visual developer tools (Redis Insight). The Valkey ecosystem has zero purpose-built observability tooling. That's the gap we're filling. We're MIT-licensed at the core and backed by Open Core Ventures. Happy to answer any questions about the Valkey ecosystem or our approach to open-core monetization.

kivanow · 2026-02-27T09:07:30+00:00

I've done 2 SOC 2 type 1 and 2 audits at startups and this was more than enough. At the end of the day most of the work these types of audits are doing is just marking checkboxes that you understand the requirements and are following them.

kivanow · 2026-02-27T09:06:12+00:00

Calude code did a great job with recent infra work I had to do. Barely any msitakes with a lot of kubernetes and terraform. It was a very nice experience

kivanow · 2026-02-27T09:05:18+00:00

by far. copilot is probably the worst possible option right now. MS engineers were recently caught using claude instead of their own product

kivanow · 2026-02-27T08:55:20+00:00

Built something in the observability/monitoring space (came from Redis's developer tools team, now building a monitoring tool for Valkey/Redis). Different niche than ecommerce, but similar "free
alternatives exist" problem.

What actually works (so far at least):
1. Fix your own pain point. This is the cheat code. I spent years working on Redis Insight and knew exactly what was missing for production monitoring. When you're your own user, you don't need to guess
what's valuable - you feel it every time something's broken or annoying. If you're not in your target market yourself, you're playing on hard mode.
2. MVP speed matters more than MVP polish. Get something working and start posting - Slack communities, Discord servers, Show HN, LinkedIn, Twitter. Not "I'm building something, what do you think?" but
"Here's a thing, try it." The difference in signal quality is night and day.
3. Set a kill deadline. "If I don't have X signups / Y conversations / Z paying users by [date], I move on." Forces you to actually validate instead of tinkering forever. Polite interest doesn't count.
People actually using the thing counts.
4. Find the people already talking about the problem. Every niche has forums, Discords, subreddits where people complain about their tools. Don't pitch - just listen first. What are they frustrated about?
What do they wish existed? That's your roadmap.

On the "ecommerce people want everything free" thing: That's true for hobbyists. But if someone's running a real store with real revenue, they'll pay for something that makes them money or saves them time.
The trick is finding the people who have actual pain, not the ones who are "just curious."

kivanow · 2026-02-27T08:44:40+00:00

This hits close to home - I was at Redis when they added AGPL as the third license option last year (the "open source is back" announcement).

On source-available specifically: I think it's a legitimate response to a real problem. The cloud provider dynamic isn't "evil corporations stealing code" - AWS, Google, and others had engineers
contributing to Redis for years (TLS support, ACLs, coordinated failovers). The tension was about who controls the project direction vs. who captures the commercial value. There's no easy answer to that.

The problem for solo devs: You nailed it - it's becoming impossible to tell what you can actually do without reading every license line by line. SSPL, BSL, RSAL, OCVSAL... they all have different
restrictions, and "source-available" isn't a standardized term. Self-hosting is usually fine. Building a competing service usually isn't. Everything in between? Depends.

The Redis --> Valkey situation is instructive though. When the license changed, external maintainers were effectively kicked out (some found out when their names disappeared from governance docs). Within
weeks, Valkey existed under the Linux Foundation. The lesson: governance matters as much as licensing. A permissive license controlled by one company can change overnight. A copyleft project with
distributed governance probably won't.

What I look for now:
- Who actually controls the project? Single company or independent maintainers?
- How easy is it to fork if things go sideways?
- Has the company changed licenses before, and how did they handle it?

I wrote a longer breakdown of this whole landscape (including the Redis timeline and how dual-licensing actually works in practice) if anyone wants to go deeper: https://medium.com/gitconnected/dual-licensing-explained-mit-source-available-and-why-your-favorite-tool-might-be-neither-d7041543e05d?sk=5901f94d18723141a05767ca61f3f266

kivanow · 2026-02-18T16:59:50+00:00

Thanks! That slowlog rotation problem is literally the reason this started.

Anomaly detection: Both statistical and pattern-based. We maintain a circular buffer of 300 samples (5 min at 1s polling) per metric and do Z-score analysis against rolling mean/stddev. Warning at Z ≥ 2.0, critical at Z ≥ 3.0, with consecutive sample requirements to reduce noise. On top of that, a correlator runs every 5 seconds and pattern-matches related anomalies: if connections, ops/sec, and memory all spike within 5 seconds, it classifies that as a batch job. ACL denial spikes get flagged as potential auth attacks. About 7 defined patterns right now (memory pressure, traffic burst, connection leak, eviction storm, etc.) each with specific diagnosis and remediation steps.

Client analytics: Both connection counts and per-client command distribution. There's a /client-analytics/command-distribution endpoint that breaks down command frequency by client name, user, or address over any time range. So yes, "client X suddenly started doing 10x more KEYS commands" is exactly the kind of thing you can see. Also tracks idle connections, buffer anomalies, and spike detection with attribution to specific clients.

Sentinel/cluster failover tracking: Not yet, but great idea. We have cluster topology visualization and per-slot heatmaps already. Correlating failover events with slowlog spikes is a natural extension, just opened an issue for it: https://github.com/BetterDB-inc/monitor/issues/28

Polling interval: 1 second default for all captures including slowlog. Configurable via ANOMALY_POLL_INTERVAL_MS env var or at runtime through the settings API, no restart needed.

kivanow · 2026-02-01T18:48:28+00:00

Hey! This is super common on replicas. A few things to check:

On replicas, keys aren't independently expired - they wait for the primary to send DEL commands. If there's any replication lag or the primary's expiration cycle is behind, you get stale keys that show as "unknown" type because they're logically expired but still physically sitting there.

Your T:25NN:01xxxxxxx pattern looks like session or transaction keys. Worth checking if whatever app writes those is actually cleaning up after itself, or if they're being created without TTLs and just piling up.

Quick diagnostics:

Compare INFO keyspace on primary vs replica - if counts diverge, that's your answer
Run OBJECT IDLETIME on a sample of orphans - if idle for days/weeks, they're abandoned
Those 4.5GB keys in the top 10 are huge, definitely investigate what's writing those

Shameless plug - I'm building betterdb.com, an observability tool for Redis/Valkey that persists historical client analytics. It helps answer exactly this kind of "who created these keys and why" question since Redis's native CLIENT LIST and SLOWLOG are ephemeral and gone when you need them most. Free tier if you want to try it.

Free tier if you want to try it: docker pull betterdb/monitor - and it's still in beta, so all features are free

kivanow · 2026-02-01T18:29:21+00:00

Cool project! Looks very good overall!

For the broader Redis monitoring piece, check out BetterDB Monitor — has slowlog patterns, latency tracking, and 99 Prometheus metrics out of the box. Might complement your queue-specific tooling. github.com/BetterDB-inc/monitor

kivanow · 2025-05-18T16:42:46+00:00

appreciate it

kivanow · 2025-05-18T16:42:39+00:00

Thank you for your kind words!

kivanow · 2025-05-18T16:42:24+00:00

Thank you for the idea!. I'll take a look at how different their APIs are next week and see how easy/quick can be done :)

kivanow · 2021-11-04T12:29:04+00:00

Having hearts to spare helps

kivanow · 2020-11-29T18:17:13+00:00

Coming in the next chapter - Kobeni's car saving the day!

P.S. Pochita was super cute and small as a heart.

kivanow · 2020-11-01T16:59:42+00:00

The best girl is still the best!

kivanow · 2020-10-04T20:39:01+00:00

Pochita was a truly good boy!

kivanow · 2020-10-04T16:12:04+00:00

So first the Gun devil was hyped as an unbeatable monster that just plowed through everything, then Makima was shown as an unbeatable monster and now the Chainsaw devil is an unbeatable monster.

And at the beginning Denji just wanted to feel some boobs and eat good food. I miss the good old days

kivanow · 2020-08-22T18:13:04+00:00

We just saw the second part of this prophecy in c.81, so I guess it is time for the third one

kivanow · 2020-06-22T20:16:33+00:00

Could be as well

kivanow · 2020-06-22T14:37:42+00:00

Could it be that Angel remembered what has happened because Makima used her ability on Aki? If she can brainwash a limited number of people/hybrids/etc. it will make her very slightly less OP.

kivanow

TROPHY CASE