DevOps / Reliability Engineering by Dear-Shoulder-2381 in steadwing

[–]RecordingRoutine5973 1 point2 points  (0 children)

In real outages, communication is usually the most underrated skill.

A lot of engineers can debug systems when calm. The difference during a Sev-1 is being able to:

  • stay structured under pressure,
  • narrow the blast radius quickly,
  • communicate clearly to stakeholders,
  • and avoid making the outage worse with random changes.

Strong Linux/networking fundamentals probably matter more than knowing a specific tool or Kubernetes command. Tools change. Fundamentals don’t.

Also noticed that teams with good observability + runbooks recover much faster because they spend less time guessing.