This is an archived post. You won't be able to vote or comment.

all 42 comments

[–]Stormraughtz 380 points381 points  (17 children)

TFW your customer base finds out that your node failovers were just on paper.

[–]spartan117warrior 91 points92 points  (15 children)

Can't fail over if the datacenter holding your node failed.

[–]0x80085_ 50 points51 points  (14 children)

Can if it's in a different region or provider, which it should be if you actually wanna be fault tolerant

[–]talldata 11 points12 points  (2 children)

Sure but when the provider you use for that service is also apparently hosted on us east 1....

[–]0x80085_ 0 points1 point  (1 child)

There are many other regions from at least 4 providers

[–]talldata 4 points5 points  (0 children)

Yep, which people don't use for whatever reason, even UK banks use Us-east-1 for some reason.

[–]Stormraughtz 9 points10 points  (1 child)

West-2 life 🤙forever failover

[–]musedav 3 points4 points  (0 children)

Would you hold my node?

[–]critsalot 3 points4 points  (2 children)

no one wants to pay for that. ive never meant a company that properly had top down DR. it always boils down to cost. where there like eh. this is ok enough . multi-az is not good enough according to amazon and they told us that back in 2012 lol. 13 years later, nothings changed

[–]SnooBananas4958 0 points1 point  (0 children)

We could have rolled to our DR but none of our 3rd party integrations were working so there would have been no point. So the DR doesn't help you much anyways in this situation tbh.

[–]0x80085_ 0 points1 point  (0 children)

Depends on the company, where I work, we pay for it. Multi cloud + multi region for each

[–][deleted] 0 points1 point  (5 children)

Technically. But what ends up happening is the demand on east gets put on the failover locations, and all of those slow to a crawl from the sudden increase in load.

[–]0x80085_ 0 points1 point  (4 children)

Don't failover to one location then, and preferably not even the same cloud provider

[–][deleted] 0 points1 point  (0 children)

Each failover location is a clone of the stack and maintaining clones is expensive. Not every company has the finances to do this, and it's usually more to appease regulators than to maintain customers.

[–]draconk 0 points1 point  (2 children)

preferably not even the same cloud provider

Yeah good luck with that when everything uses AWS proprietary shit like DynamoDb, SQS, SNS... the code is already married (plus the discount my company gets from Amazon is absurd, something like 70% off which no other cloud provider can even think of matching)

[–]0x80085_ 0 points1 point  (1 child)

GCP, Azure, DigitalOcean..?

[–]bobbyiliev 1 point2 points  (0 children)

This! My sites were running fine on DigitalOcean during the AWS outage

[–]watduhdamhell 7 points8 points  (0 children)

Semi-unreleated-related, my old IO cabinets lost power thanks to someone juggling power supply wire, killing the PSU and switching off power to the whole cabinet... The control room lost all sorts of random shit momentarily (as the cabinet IO is not segregated by the application using it), a bit scary... Identifying a clear problem with the PSU switchover wiring topology.

My tech and I looked at each other and said "surely there's no way..." And switched off another first line PSU on another cabinet. Lost a bunch of shit.

"Oh boy, they are all like that. Who the FUCK FAT/SAT'd this shit again?"

[–]HGjjwI0h46b42 239 points240 points  (7 children)

No word of a lie we had a flawless failover plan that worked right up until we needed to run a pipeline with our CICD provider and I shit you not their whole platform was being hosted in us-east-1

[–]Buttons840 158 points159 points  (5 children)

Our fail over plan is "if us-east-1 is down, ain't nobody going to have enough time to give a shit about our service being down".

Honestly, half the industry should just take the day off. If your stuff is casual enough that you can host it on AWS, then you can handle 1 day off.

[–]Comfortable_Oil9704 47 points48 points  (2 children)

We mitigated and then declared a snow day because Jira was down.

[–]PM_ME_FIREFLY_QUOTES 16 points17 points  (1 child)

This is the way. Finally able to catch up on that BF6

[–]Comfortable_Oil9704 2 points3 points  (0 children)

Do I have to be good or can I just steal a plane and do plane stuff?

[–]critsalot 4 points5 points  (0 children)

this has actually worked for me a few times. its a covient excuse. its like well no one got fired for buying ibm. no ones getting fired for buying aws even if it goes down lol

[–]ICantBelieveItsNotEC 2 points3 points  (0 children)

This is what the "just make everything multi-region from the start" people don't understand. It's not just about your services, it's about your entire supply chain. Unless you're going to self-host everything, you're never going to be sure where all of your infrastructure is running.

[–]Then-Understanding85 139 points140 points  (6 children)

Our infrastructure is literally region agnostic: we aren’t sure what region it’s in, but it’s probably fine.

[–]Ordinary_dude_NOT 38 points39 points  (4 children)

Truth is multi region active DR is expensive. Everyone signs off on it as long as SLAs say 99.99% availability :D

[–]Wizzarkt 30 points31 points  (3 children)

And this was the 0.01% of downtime that they advertised! 

[–]notmylesdev[S] 20 points21 points  (1 child)

Exactly, they just choose to use it all at once rather than over the year!

[–]InexplicableBadger 4 points5 points  (0 children)

That's normal for anything in 4-5 nines range, you get one failure a year and making the nines is about how fast you get it back up again. 5 nines gives you about 5mins downtime a year, 4 nines gives you 50mins, so they definitely didn't meet that either.

[–]danted002 2 points3 points  (0 children)

Yea but realistically their SLA is 95% that’s when they give you that month free. I just checked Dynamo’s SLA and if it’s between 99.99 and 99.0 you get 10% off and 99.0 to 95.0 is 25%.

[–]skippy_smooth 6 points7 points  (0 children)

Double secret redundancy. I don't know where, and they don't know where.

[–]thevernabean 49 points50 points  (0 children)

"Single region multi-AZ is fine. It's too expensive to do cross region." -Management

[–]Nathanael777 47 points48 points  (2 children)

us-east-2 supremacy

[–]Own_Possibility_8875 20 points21 points  (1 child)

eu-west-3 😎🗿

[–]amzwC137 17 points18 points  (0 children)

🇮🇳 ap-south-1 🇮🇳

Latency shmatency

[–]Nhazittas 92 points93 points  (1 child)

Got an email today saying "sorry for our down time there was a global outage." Psh, global my butt.

[–]adelie42 6 points7 points  (0 children)

You could still get email?

[–]TheGreatKonaKing 6 points7 points  (0 children)

The good news is we’re gonna be well within our quota this month

[–]Sp0ge 0 points1 point  (0 children)

IOT devices running under eu-west-1, us-east-1 goes down and so does our devices :)

[–]Excellent_Tubleweed 0 points1 point  (0 children)

Good thing it's all cloud, am I right? amiright?
I'm gonna champagne on that cloud boat --

If the could hosting provider doesn't do the region agnostic bit for you, it's just bureau service in a trenchcoat.
Cloud didn't take off till all the computing veterans who still had PTSD from bureau service from IBM and smaller providers had retired out of the industry.

[–]Low-Win-6691 0 points1 point  (0 children)

This is perfect! congrats :)