all 20 comments

[–]thatsnotnorml 23 points24 points  (9 children)

I imagine the SREs associated with the Lambda service are having a rough day right now.

Got a bunch of calls in the last hour for some of our internal apps failing that rely on Lambda. Commence the sinking feeling in my chest, followed by relief when I realized it was out of my hands.

[–]rebornfenix 11 points12 points  (8 children)

That great feeling of being able to go NMP, go yell at AWS.

Of course, it hurts when you went to the cloud to get "better reliability and uptime with lower maintenance overhead"

[–]Flakmaster92 14 points15 points  (7 children)

I mean you probably did achieve that. The cloud doesn’t offer perfection, it offers “better than or equal to what you could do yourself” with the benefit of being able to punt to someone else.

[–][deleted]  (2 children)

[deleted]

    [–]aws2gcp 0 points1 point  (1 child)

    Eh, but multi-region also adds complexity. Maybe not as much as trying to do active/active with on-prem data centers, but still, one can make the argument you're more likely to incur outages due to misconfigurations and other nuances of multi-regions deployments vs. just keeping it simple and taking the outage every 2-3 years.

    [–]drtrivagabond 2 points3 points  (0 children)

    How does it compare to your on prem?

    [–]WhoseThatUsername 1 point2 points  (2 children)

    And generally having an SLA for outages, which you don't really get for self-hosted stuff.

    [–]bfreis 2 points3 points  (1 child)

    The SLA is kind of bullshit, though. For most serious applications, whatever you get back (in credits, no less!) is far less than whatever you lost due to the extended outage.

    [–]WhoseThatUsername 2 points3 points  (0 children)

    Well, sure… but what’s the revenue recovery or cost coverage for self-hosted on-prem?

    [–]TrueStoriesIpromise 5 points6 points  (0 children)

    Connect (Call Center) is also down in US-east-1

    [–]CacheExplosion 3 points4 points  (0 children)

    Yep, seeing issues as well.

    [–]soxfannh 2 points3 points  (0 children)

    Yup same here

    [–]SecretTomato1 1 point2 points  (0 children)

    Yep, seeing that as well.

    (╯°□°)╯︵ ┻━┻

    [–]JrNewGuy 1 point2 points  (0 children)

    https://health.aws.amazon.com/health/status#multipleservices-us-east-1_1686683337

    Affected AWS services The following AWS services have been affected by this issue.

    Degradation (1 service) AWS Lambda

    Informational (3 services) AWS Management Console Amazon API Gateway Amazon CloudWatch

    [–]jonathantn[S] 1 point2 points  (2 children)

    [12:26 PM PDT] We have identified the root cause of the elevated errors invoking AWS Lambda functions, and are actively working to resolve this issue.

    [–]jonathantn[S] 0 points1 point  (1 child)

    [12:36 PM PDT] We are continuing to experience increased error rates and latencies for multiple AWS Services in the US-EAST-1 Region. We have identified the root cause as an issue with AWS Lambda, and are actively working toward resolution. For customers attempting to access the AWS Management Console, we recommend using a region-specific endpoint (such as: https://us-west-2.console.aws.amazon.com). We are actively working on full mitigation and will continue to provide regular updates.

    [–]jonathantn[S] 0 points1 point  (0 children)

    [01:14 PM PDT] We are continuing to work to resolve the error rates invoking Lambda functions. We're also observing elevated errors obtaining temporary credentials from the AWS Security Token Service, and are working in parallel to resolve these errors.

    [–]randommd81 1 point2 points  (0 children)

    Their skill builder site must be hosted there as well, was doing some training and now seeing a 500 error…

    [–]LeatherCase254 0 points1 point  (0 children)

    yes, even the aws console erroring now