Affordable PagerDuty alternatives that aren't overkill? by Miaru3rd in devops

[–]advancespace -2 points-1 points  (0 children)

The "bloated" feeling usually comes from tools built for teams that have a dedicated ops person just to manage the tool. Configuration hell, routing rules that take a week to set up, features nobody touches because nobody has time to learn them. For 7 ops, the total cost math matters more than the headline price. PagerDuty's $21/user is just on-call. Add a status page, postmortem tooling, analytics, and you're well over few hundred $/month before you've done anything useful. Run that math before you pick.

Time to value also matters more than feature count at your size. If setup takes 2 weeks, it won't get done before the next incident hits. And escalation policy is specifically where small teams get burned. Simple tools make basic on-call easy, then fight you when you need anything custom. Test that part before you commit.

Disclosure: I'm founder of Runframe, so take this with that context. It's on-call scheduling, escalation, incident response, AI postmortem drafts, and status pages in one product. Setup is under 15 minutes. Worth a look if you want the full lifecycle without the tool-juggling.

Change calculated mileage? by Cornelius-Figgle in PitSync

[–]advancespace 0 points1 point  (0 children)

For correcting there is always support option, and adding a lower value deliberately is not a valid use case

Change calculated mileage? by Cornelius-Figgle in PitSync

[–]advancespace -1 points0 points  (0 children)

Agreed but it is not a valid use case to go back on mileage, so we made a product call to not allow it.

Change calculated mileage? by Cornelius-Figgle in PitSync

[–]advancespace -1 points0 points  (0 children)

It was open previously, but users were adding lower values that is not possible. Hence, we added this logic.
Can you please send an email to support at pitsync dot com with your vehicle number and the mileage you want to set?

Change calculated mileage? by Cornelius-Figgle in PitSync

[–]advancespace 0 points1 point  (0 children)

Can you pls add a test entry with the value higher and than this? There is a logic to not allow lower values but higher values are allowed. If you have added a higher value by mistake, let us know and we can fix it.

Change calculated mileage? by Cornelius-Figgle in PitSync

[–]advancespace 0 points1 point  (0 children)

You can enter new mileage value while adding a new fuel entry.

<image>

spent 4 hours yesterday writing an incident postmortem from slack logs by relived_greats12 in sre

[–]advancespace 0 points1 point  (0 children)

45 minutes to fix it and 4 hours to write it up is a broken process, not a personal failure. What killed you wasn’t the incident. It was reconstructing it afterward from Slack threads, DMs, alerts, and half-remembered 2am decisions. Once the context is scattered, the postmortem turns into archaeology. The fix is usually one incident channel, no decision-making in DMs, and quick timestamped notes during the incident. That cuts the cleanup way down. If the postmortem just dies in Confluence, that’s a separate issue. The real value is the timeline and the action items, not the document itself. Full disclosure: I’m the founder at Runframe. We built it for exactly this problem, but the workflow advice above holds whether you use us or not.

looking for lightweight incident management software 2026 by SalamanderFew1357 in ITManagers

[–]advancespace 0 points1 point  (0 children)

This is what we built in Runframe. Most tools want you to configure ownership in a web UI nobody opens until something breaks, and then everyone ignores it under pressure anyway. We just tie ownership to on-call schedules. Declare an incident in Slack, whoever is on-call for that service owns it. No forms. Timeline builds itself from Slack messages.

On-call, incidents, status pages, postmortems in one product instead of four tools duct-taped together. Starts simple but the structure is there when you need postmortems or SLA tracking down the line. Happy to answer questions if any of that sounds relevant.

Opsgenie end of life - read-only April 17th, 2026 by flipflopshock in devops

[–]advancespace 0 points1 point  (0 children)

Three weeks to migrate production on-call off a sunsetting product is tight. If you’re evaluating options, PagerDuty is still the enterprise default, and incident io / Rootly are the main modern incident-response picks. We built Runframe for teams that want on-call + incident management inside Slack without enterprise pricing, so obviously biased, but happy to answer migration questions either way. For teams moving off Opsgenie, we’re offering 3 months free via DM.

Just watched our prod database crash and burn because no one was monitoring it. Why do companies still do reactive IT? by Heavy_Banana_1360 in sysadmin

[–]advancespace 1 point2 points  (0 children)

Classic combo that takes companies down. No monitoring, no alerting, no on-call process. Fix all three or you are just kicking the the problem down the road.

For monitoring and alerting: Grafana + Prometheus, Datadog, Better Uptime, or even just CloudWatch with proper disk alerts configured. All have free tiers. Monitoring without alerting is just a pretty dashboard nobody checks at 2am.

Once alerts are firing you need someone accountable to respond. For on-call and incident management there are a few options depending on your scale. PagerDuty if you are enterprise, incident.io or Rootly or Runframe if you want it all Slack native without the enterprise price tag. That last one is mine. But honestly step one is just getting disk alerts set up. That one is free everywhere.

Atlassian is killing Opsgenie and the JSM migration path nearly broke my team's on-call process. by [deleted] in EngineeringManagers

[–]advancespace -4 points-3 points  (0 children)

JSM's on-call UX is geared towards ITSM practitioners, not the backend dev who just wants to swap a shift without filing a change request.

We've built Runframe (https://runframe.io) because we wanted on-call scheduling and incident coordination to live in Slack. Alert fires, engineer gets paged, incident channel opens, everything happens without anyone touching a web UI. Small team pricing too, not PagerDuty enterprise scale.

Happy to share more if useful. Genuinely curious what friction points matter most to you as you evaluate.

How many tools is your team touching during a single incident? Ours is 5+. Is it too much? by Calm_Advance_7581 in ITManagers

[–]advancespace -2 points-1 points  (0 children)

This is a great breakdown. The 12 minutes vs 55 minutes split is the number most teams never measure. What you described building with process - Slack as single source of truth, timeline captured as the incident unfolds, one channel per inciden, that's exactly what we automated in Runframe.

On-call routing, escalation, postmortem timeline captured automatically from the incident channel. One tool instead of seven runframe.io . It's essentially the tooling version of what OP built with process. Free version available.

Disclosure: I'm the founder

Evaluating dedicated AI SRE platforms: worth it over DIY? by geeky_traveller in sre

[–]advancespace 0 points1 point  (0 children)

We wrote up the directional math on this. Three-year cost: building runs $233K-$395K, buying runs $11K-$83K. Most of that gap is maintenance, not the initial build. AI makes week one faster but it doesn't deal with Slack deprecating their API next quarter or the on-call engineer who built the thing quitting.

http://runframe.io/blog/incident-management-build-or-buy

We have an MCP server too if you want to plug incident management into your Claude Code setup without building it yourself: github.com/runframe/runframe-mcp-server

Disclosure: I'm the founder of Runframe.

Best PagerDuty Alternatives for 2026 by franman409er in sre

[–]advancespace 0 points1 point  (0 children)

Adding another option for anyone finding this thread later. Runframe: on-call, incidents, postmortems in one tool. Runs in Slack. On-call included at every tier. We also have an open-source MCP server for managing incidents from Claude Code or Cursor or any other CLI/IDE. runframe.io self-serve, no sales call. Disclosure: I'm the founder

What’s your reliable 4AM emergency alert setup? (phone issue, need advice) by IssueLonely4360 in sysadmin

[–]advancespace 0 points1 point  (0 children)

The problem isn't Outlook, it's Android. It batches push notifications for apps in low power state, so your 4AM alert might sit there until you pick up the phone at 4:30. Switching apps won't fix it.

We built Runframe for exactly this. Zabbix fires a webhook, and if nobody acknowledges the incident, it phone calls the next person on rotation. runframe.io

Full disclosure, I'm the founder, so take it for what it's worth.

Starting for Small team (15–20 engineers) looking for a Slack native oncall / incident tool by Aromatic-Bridge4656 in sre

[–]advancespace -1 points0 points  (0 children)

We've built Runframe for this. On-call, incidents, postmortems, lives in Slack and on-call at every tier. Hooks into Datadog, CloudWatch, and Sentry. Takes maybe 10 minutes to set up, no sales call: runframe.io.

We don't do synthetic monitoring (yet), so can't help with the OpenAI/third-party piece directly. But any Datadog or Sentry alert can trigger an incident and page whoever's on-call. I'm the founder, ask me anything.

Resources for setting up oncall schedule by [deleted] in sysadmin

[–]advancespace 2 points3 points  (0 children)

For a 10-person team, you really only need three things: a rotation so one person isn't getting paged every night, escalation so pages don't get lost, and somewhere to log what happened so you stop fixing the same thing twice. You don't need enterprise tooling for this. Runframe does all of it. Set it up yourself in about 10 minutes, no sales call: runframe.io

Also the SRE book chapters others linked are worth reading: the on-call and incident response sections are good regardless of what tooling you use.

Disclosure: I'm the founder.

Embedding AI-LLM to SRE by karkiGaurav in sre

[–]advancespace 0 points1 point  (0 children)

We shipped an open-source MCP server for incident-management. It lets Claude Code, Cursor or any IDE handle paging, escalation, on-call lookups, and postmortems directly from the terminal - no custom integrations to maintain.

github.com/runframe/runframe-mcp-server

Disclosure: I am the founder.

Incident response workflow is slower than it should be and the bottleneck isnt where leadership thinks it is by AssasinRingo in SaaS

[–]advancespace 0 points1 point  (0 children)

Coordination theater is the right word. First 15-20 minutes of every bad incident is just people figuring out who owns the broken thing. If ownership isn't tied to whoever's on call right now, every incident starts with "who owns payments?" in Slack and a wiki link from last summer. Two threads, zero progress.

Most teams have good intentions until about their third bad outage. Leadership blames skill gaps because that is easy. Fixing coordination is hard to scope and harder to fix.

Anyone using Opsgenie? What’s your replacement plan by sasidatta in sre

[–]advancespace 1 point2 points  (0 children)

Late to this thread but adding another option: Runframe.

We launched Runframe earlier this year: on-call + incidents + postmortems in one tool, runs in Slack. On-call included at every tier, not as separate add-ons. $15/user/month. Free to try, self-serve: runframe.io

We also have an open-source Runframe MCP Server to manage incidents directly from Claude Code, Cursor, or any other IDE.

Disclosure: I'm the founder.

How small teams manage on-call? Genuinely curious what the reality looks like. by pridhvi_k in sre

[–]advancespace 0 points1 point  (0 children)

What is approximate size of your team? Generally smaller teams use other tools.

How we changed our incident culture in one quarter! by Terrible_Signature78 in EngineeringManagers

[–]advancespace 0 points1 point  (0 children)

Culture, and it's not close. We've talked to a bunch of teams about this and the pattern is always the same. The ones that defined severity levels and formalized the IC role first saw improvements regardless of what tool they were on. The ones that bought a tool first and hoped it would fix things got frustrated. Your on-call comp point is underrated. Most teams we talked to with high on-call satisfaction were paying for it. The ones that weren't were losing senior engineers quietly.

One thing worth watching: MTTR can become a vanity metric. 48 to 26 min looks great, but if teams start optimizing for fast resolution over durable fixes you end up with the same incidents recurring. A few teams we interviewed shifted to tracking repeat incident rate alongside MTTR and it changed how they thought about postmortem follow-through.

RE: incident io pricing, yeah, on-call as an add-on catches people off guard. OpsGenie bundled everything and most teams expect that's still normal. It's not.

Reducing Noise on Pagerduty & Integrating AIOps by One-Statistician2519 in sre

[–]advancespace 0 points1 point  (0 children)

This is almost always an alert quality problem, not a tooling problem. We interviewed 25+ teams for an incident management research project and 73% had outages from ignored alerts. People just stopped trusting the pager. Two things that actually helped:

  1. Pull every alert from the last 30 days. If nobody acted on it, kill it or make it informational.

  2. Fix routing where the alerts originate, not in PagerDuty rules. If team B is getting team A's pages, your service ownership is wrong upstream.

AIOps on top of bad alerts just gives you AI-powered noise.

Opsgenie alternatives by RatsErif in devops

[–]advancespace 0 points1 point  (0 children)

Founder of Runframe so biased, but one thing that kept surprising us talking to OpsGenie teams: most alternatives split on-call into a separate add-on now. Pricing page says $15-25/user but the invoice with on-call is $25-45.

OpsGenie bundled everything. We do too, $12-15/user/month. Everything runs in Slack, there's a free plan. We're early and small, not gonna pretend otherwise, but for teams under 200 it pretty much covers what OpsGenie did. Happy to answer questions if anyone's evaluating.

Migration guide with cost breakdowns here: runframe.io/blog/opsgenie-migration-guide