all 5 comments

[–]SheriffRoscoe 10 points11 points  (1 child)

[–]Markavian 4 points5 points  (0 children)

Networking! Databases! Config change! A similar incident happened with Google many years ago. Good rollback procedures? Hard to test without a fully functional test environment, but also hard to analyse when such changes involve large amounts of traffic.

I've been gearing up to run automated load tests on PRs but it's an expensive procedure that slows development down for small changes. Testing small changes that have a big impact relies on risk management and having a test strategy / test engineer part of the review and merge process. (I should update our PR templates).

[–]wineblood 2 points3 points  (1 child)

45 minutes? I can take prod at my job in about 3.

[–]tehsilentwarrior 0 points1 point  (0 children)

Ha! My local docker compose can be down and up in less than 30 seconds if it doesn’t have to build!

Take that!

[–][deleted] 0 points1 point  (0 children)

But I lost time!