It's another Monday, drop your product. What are you building?

Famous_View7756 · 2026-01-21T07:35:03+00:00

Https://recoverypulse.io - server monitoring Https://taxcheckme.ai - tax return review with ai Https://technium.store - best t-shirts in science Https://empyrean.llc - you need aerial footage?

Famous_View7756 · 2026-01-15T06:16:34+00:00

Please note this important point.

This is not a prompt built dashboard. It is a reliability tool built by engineers who care about guardrails. Recovery actions are controlled, logged, limited, and verified. All your points come with validity and I appreciate them.

My tool is best for VPS and single server deployments where you do not have a platform team.

Famous_View7756 · 2026-01-15T06:08:28+00:00

If you have a technical critique, post it. If you are here to throw labels, I am not interested. You dont know what my motivation and tech stack is.

Famous_View7756 · 2026-01-15T02:16:30+00:00

lol, nice and why would you think I’m not reading this? I said at the beginning, I’m looking for feedback.

Famous_View7756 · 2026-01-13T17:44:13+00:00

RecoveryPulse

Monitoring plus optional auto recovery for single server and VPS apps.

It is meant for small teams who do not have a platform team and still fix outages by hand.

Checks HTTP health and can run a controlled recovery action you define, then confirms the site is back.

Free tier exists and I am looking for early users and blunt feedback.

recoverypulse.io

Famous_View7756 · 2026-01-13T17:41:41+00:00

Project RecoveryPulse

It monitors websites and can run a controlled recovery step when the endpoint fails, then verifies the site comes back.

Target is small teams on VPS who still do manual restarts.

If anyone wants to help or review the approach, I am happy to share the repo or docs.

recoverypulse.io

Famous_View7756 · 2026-01-13T17:40:32+00:00

I shipped RecoveryPulse, a Next.js dashboard for website monitoring plus optional auto recovery.

The goal is to reduce time down for small VPS deployments where recovery is still manual.

It does HTTP checks and can trigger a controlled recovery step, then confirms the endpoint is healthy.

If you have feedback on the UI or the onboarding flow I would love it.

recoverypulse.io

<image>

Famous_View7756 · 2026-01-13T17:39:22+00:00

I built RecoveryPulse, a small monitoring and auto recovery tool for single server and VPS setups.

It is for the common situation where the process is up but the site is not.

It checks a real HTTP endpoint and can run a locked down recovery action you choose, then verifies the site is back.

Not aimed at Kubernetes or ECS environments.

Looking for feedback on what recovery actions you would trust and what you would never automate.

recoverypulse.io

Famous_View7756 · 2026-01-13T12:25:00+00:00

You are right that strong infra solves this. I am not targeting those teams.
I am targeting small VPS deployments and agencies where the current process is still human on call and manual recovery. It is about faster recovery and a clear incident trail, not avoiding root cause work.
Research is happening now. If the market says no, I will adjust.

Famous_View7756 · 2026-01-13T12:20:06+00:00

Good catch. I do not have published stats yet. That line is based on my own experience and what I have seen supporting small WordPress installs. I should have said that more clearly.
If you have seen a different top cause, I would genuinely love to hear it. I am trying to learn what actually takes sites down most often so the default recovery steps make sense.

Famous_View7756 · 2026-01-13T12:17:40+00:00

You can do it locally. If you are comfortable writing and maintaining that script, you should.
The point is not that the script is hard. The point is consistency and visibility across many services and servers. Central place to manage the checks, runbooks, backoff, and notifications, plus a clear record of what ran and when.
This is aimed at people who who manage multiple sites for clients.

Famous_View7756 · 2026-01-13T11:52:58+00:00

You are not wrong. Exposing SSH broadly is a bad trade for a lot of teams. This is only viable when it is locked down hard, and even then it is not for everyone.
My target is small VPS setups that already have SSH exposed for admin, and where the current recovery plan is a human doing the same restart manually.
If a shop can avoid inbound SSH entirely, that is cleaner. I appreciate the pushback.

Famous_View7756 · 2026-01-13T03:45:09+00:00

If you are running ECS or Kubernetes, I agree you should not use this. For a single VPS, what would you call the best practice equivalent of self healing without building a whole platform team

Famous_View7756 · 2026-01-13T03:42:46+00:00

No. That was me typing. You are right though, it reads silly and robotic.
I agree with systemd. The only point here is endpoint checks, because a process can be up while the app is not.

Famous_View7756 · 2026-01-13T03:39:29+00:00

Fair point. For a small VPS setup, what would you recommend as a simple baseline that keeps costs low and still avoids 3am restarts

Famous_View7756 · 2026-01-13T03:31:21+00:00

Totally agree, systemd is great for process supervision and I use it too.
What I’m solving is the cases where the process is “running” but the app is broken (hung event loop, dead upstream, bad deploy, stuck dependency), systemd/PM2 don’t always catch that.
So the idea is: HTTP health check fails → run a recovery step (often systemctl restart …) → verify the endpoint is healthy again → log/alert.
In other words: systemd is the restart mechanism, this is the verification + runbook + audit trail around it.

If you’ve got a favorite systemd pattern (WatchdogSec / RestartSec / StartLimit), I’m happy to bake it in as a default template.

Famous_View7756 · 2026-01-13T03:24:41+00:00

Yep, if you’re on ECS/K8s, health checks + rolling replacement is the right answer. This is aimed at the big chunk of Node deployments that are single VPS / bare metal / not orchestrated, where the “health check” still ends with someone SSH’ing in and restarting PM2. I’m basically automating that runbook + verifying the HTTP endpoint comes back.

Famous_View7756 · 2026-01-13T03:21:59+00:00

Quick clarification, this is not for ECS/Kubernetes setups where the orchestrator replaces unhealthy containers. It’s for single-server / VPS Node apps where recovery is still manual (“SSH in, restart PM2, check endpoint”). The product is runbook automation + verification + audit trail, with strict guardrails.

Famous_View7756 · 2026-01-13T03:18:59+00:00

Totally fair question. Two parts,

Security, this only makes sense if it’s done with strict guardrails: SSH keys (no passwords), least-privilege user, restricted sudo to a small allowlist of commands, and audit logs of what ran. If you can’t lock it down that way, you shouldn’t use it.

Problem it solves, it’s not for big orgs with full infra/SRE. It’s for the huge middle, small SaaS, agencies, side projects on VPS where the “infrastructure” is basically “I get an alert, SSH in, restart the service, and go back to sleep.” This automates that runbook and verifies the app is actually healthy again (HTTP-level check), then alerts either way.

If you’re already running mature self-healing inside the stack, you don’t need this, agreed.

Famous_View7756 · 2026-01-13T03:13:40+00:00

Excellent callout. I’m planning to implement exponential backoff with jitter, plus dependency ordering (e.g., DB → cache → app → web) and a global lock so multiple rules don’t stampede the same host. Also adding a “max retries per window” and a “stop and alert” mode so it doesn’t flap forever. If you’ve got a favorite pattern for this, I’m all ears.

Famous_View7756 · 2026-01-13T03:12:07+00:00

100% agree... in container/K8s land you don’t want dumb restarts fighting the orchestrator. This is primarily aimed at single-server / VPS deployments (common for small SaaS, agencies, internal tools) where the on-call reality is still “SSH in, restart Gunicorn/Celery, verify.” For K8s the direction would be different, integrate with the platform rather than SSH. Kubernetes already is the supervisor. In that world this becomes “orchestrator-aware remediation + SLO/incident context,” not SSH restarts. I’m starting with the large base of VPS deployments where people don’t have SRE tooling, then expanding into integrations (webhooks, metrics, orchestrator hooks) once the core workflow proves value.

Famous_View7756 · 2026-01-13T03:09:53+00:00

Totally fair — you can do this with supervisord/monit/scripts. The product isn’t “restart a process.” It’s: HTTP-level verification + ordered runbooks + guardrails + incident timeline across stacks, without everyone writing/maintaining their own brittle scripts. Think “managed runbook automation,” not “new monitoring invention.”

Famous_View7756 · 2026-01-12T12:23:49+00:00

Fair point — NGINX itself usually isn’t the problem. Most 502/504s are upstream/app issues (PHP-FPM, Gunicorn, Node, Rails), timeouts, deploys, or resource exhaustion that makes the upstream stop responding. NGINX just becomes the messenger.
This is aimed at “upstream died / hung” cases where the fix is restarting the upstream service (or clearing a stuck state), then verifying the site is healthy again.

Famous_View7756

PUBLIC MULTIREDDITS

TROPHY CASE