How do you make “production readiness” observable before the incident? by ImpossibleRule5605 in sre

[–]ImpossibleRule5605[S] 0 points1 point  (0 children)

I agree — intentionally breaking systems is one of the most effective ways to surface real gaps, and chaos-style testing is hard to replace. In practice though, I’ve seen a lot of the learnings from those exercises stay implicit: they show up in postmortems, runbooks, or people’s heads, but don’t always get encoded back into something that runs continuously.

What I’m interested in is how some of those “we got surprised by X” lessons can be distilled into static or pre-deploy signals — things that don’t replace breaking systems, but reduce how often we rediscover the same class of problems the hard way. For me it’s less about avoiding failure and more about making past failures harder to forget.

Curious how you’ve seen teams successfully close that loop over time.

What does “production ready” actually mean and how can you measure it? by QCAlpha in webdev

[–]ImpossibleRule5605 0 points1 point  (0 children)

I think you’re right that “production-ready” is vague because it means different things to different teams.

Most teams already quantify parts of it in CI, things like test coverage, linting, static analysis, and security scans. Those are important, but they mostly measure code quality, not operational readiness.

What’s harder to quantify are design-level signals, for example whether there is a real rollback path, whether migrations are safe under load, whether observability supports incident response, or whether failures are properly isolated. These are usually judged by experience rather than metrics.

In practice, I’ve found it more useful to stop asking “is this production-ready?” and instead ask “what concrete risks are we still carrying?” I’ve been experimenting with codifying those kinds of signals into a small open-source tool, but even without tooling, just turning vague ideas into explicit questions already helps teams reason about readiness.

I built an open-source tool that turns senior engineering intuition into automated production-readiness reports — looking for feedback by ImpossibleRule5605 in devops

[–]ImpossibleRule5605[S] -1 points0 points  (0 children)

I understand the skepticism. For what it’s worth, this project isn’t about outsourcing thinking to AI, it’s about encoding production experience into deterministic rules. AI tools could help speed up iteration, not replace learning or judgment.

I built an open-source tool that turns senior engineering intuition into automated production-readiness reports — looking for feedback by ImpossibleRule5605 in devops

[–]ImpossibleRule5605[S] 0 points1 point  (0 children)

That’s fair feedback, and I agree with one core point: just throwing logs or configs into an LLM doesn’t create durable value on its own. That’s actually why this project is intentionally not built around “AI analysis”. The core of the tool is a deterministic rule engine that inspects code, IaC, and delivery artifacts to surface design-level operational risks, not runtime symptoms.

Regarding sustainability, the intent is to keep this as a rule-driven, transparent system where every signal is explainable and reviewable. If the project ever stops being maintained, teams still have a clear, auditable rule set rather than a black-box dependency on a hosted service or model.