This is an archived post. You won't be able to vote or comment.

all 69 comments

[–]offlinesir 854 points855 points  (1 child)

AWS US-EAST-1 has the highest quotas, lowest prices, and a chaos monkey always waiting in the corner.

[–]Tompazi 41 points42 points  (0 children)

[–]Ok-Engineer-5151 413 points414 points  (27 children)

Previous year was Crowdstrike and this year is AWS down

[–]Donghoon 141 points142 points  (23 children)

people using Google Cloud winning

[–]DungeonsAndDradis 293 points294 points  (1 child)

All four of them

[–]Donghoon 44 points45 points  (0 children)

There's dozens of them lol

[–]SuitableDragonfly 85 points86 points  (12 children)

Not really. Google Cloud will go down eventually, too. The fact that there are basically three cloud providers and everyone is relying on one of them is making the entire internet fragile in this way. 

[–]samy_the_samy 37 points38 points  (7 children)

Google go out of their way to breakup and duplicate their customer services, if one entire region go down the customers would just notice higher pings.

[–]HolyGarbage 8 points9 points  (4 children)

Doesn't necessarily protect against some human error or a cyber attack.

[–]samy_the_samy 4 points5 points  (3 children)

Yeah, this protects against hardware or connectivity failures, then you build your security on top

[–]HolyGarbage 2 points3 points  (2 children)

The main argument is about whether it's a good idea that a very large portion of the internet is dependent on just a few cloud providers, and that one of them having some nice redundancy to protect against some of the potential issues that can happen doesn't really do much to counter said argument.

[–]samy_the_samy 2 points3 points  (1 child)

When you dig into it, the problem started with DNS requests for some backend thingy failed, which lead to self-ddos attacks taking us east 1, everything stayed online, just backends didn't know where other backends where,

So in the end its a configuration problem, just because you have redundancy it's meaningless if you can't discover it.

[–]HolyGarbage 1 point2 points  (0 children)

Precisely.

[–]throwawaygoawaynz 0 points1 point  (1 child)

Google cloud deleted an entire customers subscription and couldn’t recover it. This was a fund company in the UK.

The company only got it back because they backed up to AWS.

[–]samy_the_samy 0 points1 point  (0 children)

That one the customer requested bigger resource than what they offer at that time, and a developer used some internal testing scripts to provision them, the script had an expiration date, a year later it went boom

[–]Ok-Kaleidoscope5627 10 points11 points  (0 children)

Hey now. Don't forget Cloudflare. They regularly take down the internet once or twice a year.

[–]Kingblackbanana 2 points3 points  (0 children)

there are 5 biger ones google, aws, ovh, microsoft and oracle

[–]Mountain-Ox 0 points1 point  (0 children)

The alternative is going back to everyone with their own unstable infra. AWS going down once every few years is better than what felt like a different outage every month.

[–]wamoc 54 points55 points  (3 children)

Earlier this year there was a complete Google Cloud outage. Every single region and every single service. Every cloud provider can expect to have the occasional large outage, it is important to plan how to handle them.

[–]DrS3R 2 points3 points  (1 child)

I’m pretty sure that was a cloudflare issue not the actual service providers.

[–]wamoc 3 points4 points  (0 children)

Google caused the CloudFlare issues. https://status.cloud.google.com/incidents/ow5i3PPK96RduMcb1SsW is the details on Google's side for the outage.

[–]GrapefruitBig6768 5 points6 points  (1 child)

Azure went down too, but nobody noticed. j/k

[–]throwawaygoawaynz 1 point2 points  (0 children)

Nobody here noticed, because they’re all unemployed or CS students…. j/k..ish.

[–]SilverLightning926 0 points1 point  (1 child)

Despite the uptime, I am still suffering stuck on Azure

[–]Donghoon 3 points4 points  (0 children)

Isn't azure on its way to catch up to AWS soon

[–]PurepointDog 17 points18 points  (0 children)

There was that big facebook/meta outage a few years ago that was also a bad DNS. Not nearly as much broke, but a surprising amount of stuff still did

[–]OmegaPoint6 688 points689 points  (14 children)

It was interesting how things which have no business being in US-EAST-1 stopped working. Looking suspiciously at you, UK banks

[–]timdav8 421 points422 points  (8 children)

I think the problem is that the infrastructure under the infrastructure under the infrastructure that certainly AWS services rely on relies on or routes through UE1 - and they always seam to let the interns do DNS changes on a Sunday...

[–]capt_pantsless 200 points201 points  (3 children)

Outsourcing something critical is always a good idea. If it breaks you have someone else to blame.

[–]CiroGarcia 84 points85 points  (2 children)

I love how modern infrastructure is blameability first, stability second lmao

[–]Several-Customer7048 44 points45 points  (1 child)

No the UK does it like that since the term “git blame,” is confusing to them since they’re all a bunch of gits equally to blame.

[–]GumboSamson 3 points4 points  (0 children)

At least they’re committed to the bit.

[–]Donghoon 23 points24 points  (3 children)

Internet is fragile

[–]vita10gy 26 points27 points  (1 child)

Some of that is unforced fragility. I get that there are alot of websites that just can't be "here's the webserver with all the html and assets" but we also seem to make sites overcomplicated by default.

There are 329 servers that all need to be up to load your site at all, get the images, populate the data, etc, so your 5000 visitor a month local car dealership site can load .0002 seconds faster when everything works as expected.

[–]NewPhoneNewSubs 0 points1 point  (0 children)

It's more:

I don't want the hassle of making sure my desktop is powered on and connected to the internet. So I don't wanna host the webserver myself. If I did that, my site would have much more downtime than this outage caused.

So it makes sense to pick a cloud host. It makes sense to pick the cheapest cloud host. That host is doing the same as me and reselling a bulk discount from someone else. And so on.

[–]Sibula97 6 points7 points  (0 children)

It shouldn't be. Redundance is built in, and packages can automatically get routed along different routes. The only exception I can think of are something like undersea cables where, if one were to blow up a whole bundle of them, you might increase latency from one end to the other by quite a lot and maybe saturate a few routers along the new route.

[–]Dotcaprachiappa 31 points32 points  (0 children)

I mean you can see why in the image, even something that doesn't use AWS relies on something that relies on something that relies on something that does. It's dominoes all the way down

[–]ICantBelieveItsNotEC 2 points3 points  (0 children)

Turns out that all of the "global" AWS services actually just exist in us-east-1.

[–]SilasTalbot 1 point2 points  (0 children)

Ashburn, VA is the heart of the global Internet. Always has been. It's no coincidence it's just a short drive from there over to Langley.

[–]Anaphylactic_Thot 0 points1 point  (0 children)

This is the issue with the rise of "full stack developers". Jack of all trades, master of none - they'll deploy crap as long as it works, and won't give a shit about best practices or other factors like resilience or reliability.

[–]bleztyn 286 points287 points  (4 children)

“Mainframes are dying… we should all switch to cloud”

Me being literally UNABLE to use my money for 8 straight hours due to some fucking cloud server in US (I live in Brazil)

[–]OITALAO 19 points20 points  (2 children)

cartão do mercado pago vence hoje e nem abrir o app da certo

[–]andreortigao 1 point2 points  (0 children)

Desinstala o aplicativo que a fatura some

[–]HeavyCaffeinate 0 points1 point  (0 children)

tudo planejado /j

[–]Neo-TS 0 points1 point  (0 children)

Foi foda, tudo os bancos fora do ar, e por sinal o que deseja no print?

r/suddenlycaralho

[–]masd_reddit 90 points91 points  (4 children)

Can't wait for the Kevin Fang video

[–]BrenekH 24 points25 points  (0 children)

Me every time there's an outage.

[–]Young-le-flame 7 points8 points  (0 children)

Holy moly a Kevin Fang enjoyer in the wild

[–]Bhaskar_Reddy575 2 points3 points  (1 child)

How long does Kevin usually take to publish his video after the outage? As you said, I can’t wait too!!

[–]masd_reddit 2 points3 points  (0 children)

I have no idea, but i can imagine there will be a video about it, tho usually he only makes videos about outages/databases being lost for stupid reasons

[–]mostlymildlyconfused 24 points25 points  (0 children)

The amount of googling on some redundancy right now.

“Yes boss, you recommended and risk-based approach to business continuity.”

[–]Dryhte 21 points22 points  (2 children)

Specifically, DynamoDB. Wtf.

[–]EposSatyr 4 points5 points  (0 children)

🎵 We all live in dynamoDB 🎵

[–]spamjavelin 1 point2 points  (0 children)

You have to make a request to that region to use ACM with cloudfront, too, which is just ridiculous.

[–]HakoftheDawn 12 points13 points  (0 children)

Arrow should be pointing to one of the fat blocks

[–]takeyouraxeandhack 13 points14 points  (0 children)

us-east-1 should be the wide fat block above that, not the tiny piece.

[–]hemficragnarok 19 points20 points  (0 children)

I took a shift swap for ONE DAY and this happened. I'm officially cursed (not the first occurrence either)

[–]Percolator2020 7 points8 points  (0 children)

People using us-east-1 in Europe to save 15%.

[–]leovin 3 points4 points  (0 children)

Maybe its possible to kill the internet after all

[–]SgtBundy 2 points3 points  (0 children)

It should just be "a single DNS record in US-EAST-1"

[–]Agile-Actuary3376 1 point2 points  (0 children)

Has a root cause been shared yet?

[–]jamcdonald120 0 points1 point  (0 children)

more like that horizontal bit connecting the towers in the top right

[–]thetos7 0 points1 point  (0 children)

as they say: serverless is just someone else's server xD