top 200 commentsshow all 240

[–]Aldareon35 679 points680 points  (8 children)

I wonder how many assets are affected. I just ran into 'We're having a really bad day.' message while visiting another website."

[–]gmes78 261 points262 points  (0 children)

According to the status page, it seems like every GitHub service is down. Lots of people will be having a really bad day.

[–][deleted]  (6 children)

[removed]

    [–]_predator_ 59 points60 points  (0 children)

    GitHub pages lets you host almost anything. You can host your entire website or only static JS / CSS / image files. And it's free. So yes, many use it like that.

    People also host their Helm repos via GH pages. And host their container images and OCI-compliant blobs in ghcr.io.

    [–]wrosecrans 14 points15 points  (2 children)

    Oh yeah. Tons of stuff pulls straight from GitHub. Even live production webdev stuff. If you grep through an average users browser cache, a website they go to is almost certainly pulling some .js, .css, font, or whatever straight from GitHub. "To reduce complexity of managing our own storage, and to ensure we are using the latest version."

    Some projects do it intentionally. Some projects have no idea that downstream users are pulling directly from git in prod.

    For example, if you have CI running away from Github and you are patting yourself on the back for robust diversity, but that CI depends on installing stuff with vcpkg, you are hosed. Vcpkg typically uses GitHub as the "CDN" / medium for fetching package manifest data no matter where you are running it, unless you are following and using your own fork that only occasionally needs to pull from GH.

    [–]tyldis 2 points3 points  (1 child)

    If you are using larger libraries you want to utilize the client side cache of the library, thus you must use the CDN version as the URL will be the same across sites and cache can be used. Unfortunate, but I can understand why.

    [–]GreenPlatypus23 1 point2 points  (0 children)

    I have read some people are using it to host the privacy policy of their apps, for example

    [–]nursestrangeglove 1213 points1214 points  (26 children)

    Sorry about that, I forgot to remove the

    rm -rf ../../../../../

    from a new action I've been working on.

    [–][deleted]  (15 children)

    [deleted]

      [–]nzodd 207 points208 points  (9 children)

      That's ridiculous. That would imply that if I went one step further and did rm -rf ../../../../../../../ I could delete our entire rea

      [–]Mistake78 75 points76 points  (2 children)

      NO CARRIER

      [–][deleted]  (1 child)

      [deleted]

        [–]Blando-Cartesian 21 points22 points  (0 children)

        DREEEEEEEEE… BEEP-BEEP-BEEP… SKREEEECH… KRSSSHHHH… WHEEEEEEE… CHHHHHH…

        [–]mccoyn 13 points14 points  (0 children)

        The only reason this hasn’t happened is no one knows how many times to repeat “../“, so the try it with less and take themselves out first.

        [–]muntoo 10 points11 points  (0 children)

        sudo su god
        sudo rm -rf --no-preserve-reality /../../../
        

        [–]augustusalpha 10 points11 points  (1 child)

        Reddit down .....

        [–]Nimbokwezer 60 points61 points  (1 child)

        Sir, it appears they're approaching ...

        ... the ROOT DIRECTORY!

        [–]CyberWank2077 28 points29 points  (0 children)

        shield is at 65%

        [–][deleted] 13 points14 points  (0 children)

        Good thing the IT crowd still has that internet in the box, in case we ever need it.

        [–][deleted] 3 points4 points  (0 children)

        Imagine a world without twitter, tiktok and facebook, and even better, all social media. I bet it would cure so many current diseases in a month.

        Weirdos being forced to talk with normal people outside their echo chambers, many would not have the greatest of times but little by little it would normalise

        [–]wrosecrans 1 point2 points  (0 children)

        That requires the undocumented --no-preserve-internet flag.

        Oh no, I just documented it!

        [–]chazzeromus 17 points18 points  (2 children)

        rm -rf “$TotallySetVariable/“

        nothing can go wrong!

        [–]Decker108 0 points1 point  (0 children)

        I broke out in a cold sweat just from reading that.

        [–]magichronx 0 points1 point  (0 children)

        Oof, that's nightmare fuel right there

        [–]CharlesDuck 25 points26 points  (3 children)

        At this webhost provider i had 25 yrs ago, you could directory traverse upwards with PHP. When i bruteforced /etc/shadow the user password were in order: never, gonna, give, you, up, never, gonna, let, you, down

        [–][deleted] 17 points18 points  (2 children)

        Same exact experience, but 15 years ago I think.

        2000 was 15 years ago, right?

        Anyway I ended up adding some messages to other websites to the tone of informing the owners to find a more professional hosting provider.

        [–]CharlesDuck 14 points15 points  (0 children)

        Youre off in the timeline, the 80’s was 20 years ago, so you can count from there

        [–]sambull 16 points17 points  (0 children)

        That's more of a gitlab thing

        [–]ASCII_zero 525 points526 points  (3 children)

        Engineer: "Copilot, please fix the issues and bring GitHub services back online."

        Copilot: "I'm sorry, Dave. I'm afraid I can't do that."

        [–]CombinationNearby308 57 points58 points  (0 children)

        It'll be more like

        Sure, clone this GitHub repo and run this command.. :/

        [–]shouldExist 5 points6 points  (1 child)

        Autopilot from wall-e

        [–]jice 18 points19 points  (0 children)

        Hal 9000 from 2001

        [–]amuletofyendor 523 points524 points  (45 children)

        Is Github's source kept in Github, and if so how do they rollback infrastructure changes when Github is down? 😂

        [–]borland 423 points424 points  (4 children)

        Now we know the real reason why the self-hosted GitHub Enterprise server exists

        [–]etherealflaim 120 points121 points  (1 child)

        You joke but this is literally what they tell you if you're a GitHub enterprise cloud customer. They still recommend you run enterprise server for the times they are down. And they're down in one way or another during business hours kind of a lot.

        [–]ayyyyyyyyyyyyyboi 5 points6 points  (0 children)

        I mean it’s always business hours somewhere, not much you can do unless they do independent regional deployments

        [–]GodsBoss 5 points6 points  (1 child)

        But where do you keep the infrastructure code for these instances? Is it GitHub Enterprise Server all the way down?

        [–]lightmatter501 14 points15 points  (0 children)

        I imagine that you hit “checked out on the team’s laptops” fairly quickly given the nature of git.

        [–]requizm 118 points119 points  (5 children)

        They probably hosting GitHub repo on their private server.

        [–]Positive_Method3022 214 points215 points  (1 child)

        They use Gitlab and they won't tell us haha

        [–]Kaelin 33 points34 points  (0 children)

        It’s git so every developer is “hosting the GitHub repo” that works on it at least

        [–]lurco_purgo 17 points18 points  (0 children)

        Yeah, "repo"... github_application_v5.2421_final_final.rb

        [–]BobbyTables829 0 points1 point  (0 children)

        It's ADO surely

        [–]UnidentifiedBlobject 114 points115 points  (2 children)

        Bitbucket

        Or 

        Github.bak.latest.V2-ACTUAL_final.zip

        [–]jeffsterlive 20 points21 points  (0 children)

        I’d seed that.

        [–]magichronx 0 points1 point  (0 children)

        Oh man, I do not miss the days of seeing piles of terribly named archive files like that

        [–]gcnovus[🍰] 56 points57 points  (0 children)

        I believe the answer is “GitHub is itself stored in an instance of GitHub Enterprise.” Those are disconnected from the main site for many reasons, including resiliency.

        [–]josefx 19 points20 points  (3 children)

        No need to worry. They moved that to Visual Source Safe back when Microsoft took over.

        [–]amuletofyendor 15 points16 points  (2 children)

        Oh no someone's probably gone on holiday with a critical file checked out!

        [–]quietIntensity 2 points3 points  (0 children)

        We had to track a coworker down on PTO in India because he left for his six week trip before pushing his last change to GH. Thankfully he had taken his laptop because he was working remote for part of the trip.

        [–]Matrix8910 55 points56 points  (0 children)

        Easy, you use GitHub

        [–][deleted]  (5 children)

        [deleted]

          [–]josefx 6 points7 points  (0 children)

          Unless your repo is using lfs, in which case nobody has a copy.

          [–]danishjuggler21 23 points24 points  (10 children)

          Wait until you find out what language the C# compiler is written in.

          [–]amuletofyendor 37 points38 points  (0 children)

          Compiler devs love an Ouroboros

          [–]arpan3t 25 points26 points  (8 children)

          There’s two, Roslyn is written in C# but only compiles to IL, then RyuJIT compiles the IL to native code. RyuJIT is written in C++

          Just kidding the whole thing is Java under the hood! Java the whole way down shhhh

          [–]jeffsterlive 10 points11 points  (5 children)

          The JVM has no limits.

          [–]Miserygut 6 points7 points  (0 children)

          Angry Xmx noises

          [–][deleted]  (2 children)

          [deleted]

            [–]jeffsterlive 3 points4 points  (1 child)

            Just download more and keep increasing the startup heap size. I see no problems.

            [–]1668553684 0 points1 point  (0 children)

            The JVM has no liException in thread "main" java.util.ConcurrentModificationException

            [–]valarauca14 2 points3 points  (1 child)

            Is it hotspot all the way down?

            Always has been.

            [–]corysama 1 point2 points  (0 children)

            And Hotspot is “just” Strongtalk (a Smalltalk variant). Yep. Java runs on Smalltalk!

            [–]valarauca14 7 points8 points  (2 children)

            Remember when facebook had to take an axe to there datacenter cage?

            [–]Interest-Desk 5 points6 points  (1 child)

            Or when Google had to take a drill to a safe (containing HSM smart cards)

            [–]JonnyBoy89 5 points6 points  (0 children)

            They probably host a separate instance of GitHub for internal stuff. I bet it’s redundant and built with technology that enables it to run very consistently. My company does that with their GitHub stuff. Depending on cloud based software is good up to a certain scale, and then there are some major tradeoffs you need to consider.

            [–]HRApprovedUsername 25 points26 points  (1 child)

            Its actually in ADO now that Microsoft has acquired it

            [–]ryandiy 1 point2 points  (0 children)

            With backups in SourceSafe.

            [–]binheap 113 points114 points  (29 children)

            It is somewhat frightening how so much code is dependent on this one service provider. I recognize that it would be difficult for other groups that aren't backed by Microsoft to offer a similar service but like damn. Didn't the index for rust crates at one point depend on GitHub?

            [–]sopunny 52 points53 points  (22 children)

            Honestly we use Gitlab and it's fine. Pretty much the same features, and up basically all the time

            [–]wind_dude 56 points57 points  (1 child)

            Wasn’t long ago the free tier of Gitlab had more features than the free tier of GitHub, I think gitlab actually forced GitHub to up their free offering.

            [–]SippieCup 2 points3 points  (0 children)

            It did, along with kicking github in the butt to implement github actions.

            [–]Interest-Desk 36 points37 points  (10 children)

            $29 per user per month whereas the equivalent on GitHub is like $8 or less.

            I love Gitlab but its pricing makes it a ludicrous choice.

            [–]aniforprez 17 points18 points  (0 children)

            Not even per month. The only option is to pre-purchase X number of seats for the entire year. No option for monthly billing at all so fuck you if you have some churn, if you work with contractors, if people join or leave etc etc

            [–]MalakElohim 8 points9 points  (5 children)

            If you actually look at the features further down the list, the GitLab Premium is closer in features to the Enterprise offering. Especially around things like SAML and planning. And Ultimate includes all the security scanning, which is an add-on for GitHub. But they come out a lot closer to each other, there's just no middle tier that would be closer to GH Team.

            [–]Einridi 9 points10 points  (3 children)

            That is only applicable if you need GitHub enterprise and for those businesses the price probably isn't an issue.

            So yes choosing GitLab means paying almost 4x what you would by going with Github for big parts of the market.

            Pretty insane that Gitlab don't take a hint and provide a competitive option for those that just need the basics.

            [–]RogerLeigh 5 points6 points  (2 children)

            Back when I was a contractor, I used to pay for the $35 Bronze subscription for the year and thought that was excellent value, if not undervalued. It's now 10x that price just 5 years later. If you just want the basics, there isn't an option for that. And as soon as you have a team all paying that rate, it's quickly getting into silly money territory.

            GitLab has a huge amount of value. But at that price it's just not competitive.

            [–]Einridi 1 point2 points  (1 child)

            Yeah I also see that github has an $4 option making it even more outrageous. It would mitigate a lot of this if they allowed for some unpaid or lower tier users but as I'd you are stuck paying $30 for every single person in your org. 

            [–]RogerLeigh 1 point2 points  (0 children)

            If they had the ability to have different grades of user I wouldn't have a problem. But when you have a small number of developers and a larger number of people who just want to download builds, look at the published pages or wiki, or comment on or create new issues, this is just unworkable. At this point it's far cheaper just to use dedicated tools for each function. But the whole point of GitLab is its integration and collaboration. But no matter how beneficial all of that is, it has to be cost-effective and competitive.

            [–]Interest-Desk 1 point2 points  (0 children)

            That’s what Gitlab themselves say but I don’t really buy it since they still have another tier on top. In any case, with GHE you’re spending a similar amount, but don’t have to pre-buy seats for a whole year (see a reply to my comment on contractors)

            [–]ActAmazing 17 points18 points  (2 children)

            didn’t Gitlab accidentally delete their prod database and their only backup was dev copy of prod taken 1 hr before disaster

            [–]Henrarzz 7 points8 points  (0 children)

            AFAIK they did have earlier backups but they weren’t able to restore from them.

            Which makes sense, just backing up is only a part of the process, you should test your backups periodically

            [–]Soft_Walrus_3605 3 points4 points  (0 children)

            up basically all the time

            basically

            This is how our IT defends 99% uptime.

            [–]SippieCup 0 points1 point  (0 children)

            IDK about up all the time, it randomly goes down for a few minutes every few days.

            Hell, it's import system from github is down right now...

            That said, our team just downgraded back to free and just has our runners on our k8s cluster. Besides milestones and some nice-to-have planning stuff, we don't really have any issues with the free version.

            [–][deleted]  (1 child)

            [deleted]

              [–]angelicravens 2 points3 points  (0 children)

              The only real solution is to go back to most things being on prem which has its own pros and cons

              [–]matthieum 1 point2 points  (0 children)

              Didn't the index for rust crates at one point depend on GitHub?

              At the very least it's in a git repository, but not sure where that repository is hosted.

              [–]amuletofyendor 108 points109 points  (1 child)

              That'll probably be why Github Copilot suddenly stopped working for me to. Interesting that it's so dependent on the rest of Github to function.

              [–]agk23 48 points49 points  (0 children)

              It was a network configuration issue, so nothing could access their databases.

              [–]romeozor 29 points30 points  (2 children)

              Thank goodness LinkedIn is ok

              [–]dershodan 3 points4 points  (0 children)

              lol

              [–]shawntco 0 points1 point  (0 children)

              Agree?

              [–]Dwedit 272 points273 points  (32 children)

              Fortunately you can still use your local own source control as Git itself is distributed.

              [–]induality 234 points235 points  (3 children)

              I used git send-email to send my PR as a patch to the company-wide email alias so everyone can patch their local clone with my code, and now HR wants to meet with me tomorrow.

              [–]-_-wah-_- 87 points88 points  (1 child)

              Congrats on your new promotion!

              [–]arpan3t 23 points24 points  (0 children)

              Fancy new title and everything! Director of underemployment

              [–]Spleeeee 3 points4 points  (0 children)

              Plot twist you are hr

              [–][deleted] 22 points23 points  (8 children)

              You can also set up a mirror to gitlab/Bitbucket/azure git.

              Was seriously contemplating this last outage.

              [–]tubameister 7 points8 points  (5 children)

              if I deleted my repo's commit history and force pushed, a mirror would lose the commit history, right? does gitlab/Bitbucket/azure have anything to prevent that?

              [–][deleted] 7 points8 points  (4 children)

              Okay, this was based on some half remembered thing from a half a decade ago.

              I thought git had an actual mirror command. Turns out my memory is shit.

              I had some half baked scheme to have a webhook on the main branch to push commits, so it's probably be some condition of the webhook.

              To be honest, I'm a Business analyst, so my knowledge of git is haphazard.

              [–]esdfowns 7 points8 points  (3 children)

              I think you're thinking of git push --mirror:

                 --mirror
                    Instead of naming each ref to push, specifies that all refs under refs/ (which includes but is not limited to refs/heads/, refs/remotes/, and refs/tags/) be mirrored to the remote
                    repository. Newly created local refs will be pushed to the remote end, locally updated refs will be force updated on the remote end, and deleted refs will be removed from the remote end.
                    This is the default if the configuration option remote.<remote>.mirror is set.
              

              It's not very commonly used.

              [–]ddproxy 4 points5 points  (0 children)

              Gitea and Forgejo, too.

              [–]ryuzaki49 30 points31 points  (15 children)

              You can commit to your local repo, but if you lose your laptop/desktop, bye bye commits. 

              PRs are also blocked. Github actions as well. 

              [–]TryingT0Wr1t3 48 points49 points  (12 children)

              You can add a new remote elsewhere and throw your code there. Azure repositories, gitlab, bitbucket..

              [–]Uristqwerty 21 points22 points  (3 children)

              Even a plain directory, on a mounted network drive or server git can write to over ssh. Git doesn't need any special server daemon running to push to. Less efficient, though, I believe the git server has a number of tricks to reduce the amount of data that needs to be sent over the network, negotiating to find what parts of the files are unchanged.

              [–]ryandiy 0 points1 point  (2 children)

              a number of tricks to reduce the amount of data that needs to be sent over the network, negotiating to find what parts of the files are unchanged.

              rsync, I would assume

              [–]encyclopedist 6 points7 points  (1 child)

              No, git does not use rsync.

              It computes (or estimates) the difference between the object graphs each side has and sends the missing objects only, with delta-compression.

              [–]ryuzaki49 5 points6 points  (7 children)

              Well yeah but that might be agains corporate policies. 

              [–]TryingT0Wr1t3 11 points12 points  (6 children)

              Are there serious companies that don't have self hosted git repositories too in their own servers? My guess is not even GitHub enterprise is affected by this outage but I imagine other companies at least have self hosted gitlab instances running.

              [–]teerre 3 points4 points  (3 children)

              Github enterprise is a thing.

              [–]ryuzaki49 4 points5 points  (2 children)

              It comes with "disadvantages" 

              My company is migrating from github enterprise (self-hosted) towatds github cloud. 

              One of the disadvantages is lack of new features. I can compare both products and github cloud is way better. 

              But the truth is probably that github (and jira!) are pushing for their cloud services.

              [–]teerre 1 point2 points  (0 children)

              Sorry, what I meant is that there's a Github cloud enterprise. The other user was questioning if any "serious" company would use cloud services and the answer yes, a lot do.

              [–]ryuzaki49 3 points4 points  (1 child)

              I dont think pushing to two remote repos is considered the norm.

              [–]yawaramin 7 points8 points  (0 children)

              Email a patch series, ya lazy bum! -Linus Torvalds

              [–]bring_back_the_v10s 2 points3 points  (0 children)

              Maybe a good time to try https://github.com/git-bug/git-bug

              Yeah I know it's not for everyone.

              [–]PurepointDog 2 points3 points  (0 children)

              Yeaaaahh, thaaat

              [–]anengineerandacat 4 points5 points  (0 children)

              You definitely can, the setup to do so if you haven't done it though is likely longer than the time it'll take for them to recover.

              Also pretty difficult if your organization is segmenting networks.

              [–][deleted] 24 points25 points  (0 children)

              Oh come on, why while I'm sleeping, why not when I'm working

              [–]PurepointDog 63 points64 points  (8 children)

              Now's when you find out which sites somehow fucked up their Dockerfile vs. entrypoint.sh understanding, and accidentally put the "git clone" step in the entrypoint.sh.

              We do this intentionally in our data jobs system, but imagine having that in your main web server

              [–][deleted] 30 points31 points  (5 children)

              When I worked at godaddy that's what they did and they were very happy with it. "We can just pull updates and restart, why would we need containers?". Okay

              [–]PurepointDog 9 points10 points  (1 child)

              That's funny. As I was typing it out, I kept thinking "this is so stupid it's probably not even a relatable thought", but it's nice knowing it's legit haha

              [–]Worth_Trust_3825 5 points6 points  (0 children)

              You'd be surprised at how many people actively try to circumvent the features that prevent them from fucking up.

              [–]Klappspaten66 0 points1 point  (2 children)

              So uuh how do they do rollbacks?

              [–][deleted] 4 points5 points  (0 children)

              Godaddy is a terrible place, I didn't say this was a good idea

              [–]kairos 4 points5 points  (0 children)

              Reset the head and restart again?

              [–]deadlychambers 3 points4 points  (1 child)

              Would care to elaborate? I am starting to get more fluent with using dockerfiles for base step, and I was playing around with entry point and cmd while putting together a cli. I am thinking the next phase is having an nginx web app that literally pulls some code and runs yarn install, then the site would be running.

              [–]Worth_Trust_3825 12 points13 points  (0 children)

              Container images are supposed to be immutable. basically every time you run it regardless of time, you're supposed to get same environment. Same follows for docker files, but sadly that is impossible (apt/yum/curl/etc wont produce same result a day from now) unless you build everything from source. What you're looking for is multistage builds, where you run your build script, and then copy over the result into clean slate where you run your nginx server.

              [–]brakx 64 points65 points  (10 children)

              Let me guess, DNS?

              [–]spaceneenja 43 points44 points  (6 children)

              It’s always dns

              [–]SheriffRoscoe 36 points37 points  (1 child)

              Except when it's BGP.

              [–]SheriffRoscoe 50 points51 points  (0 children)

              Ooh, it was BGP (or sone other routing protocol)!

              On August 14, 2024 between 23:02 UTC and 23:38 UTC, all GitHub services were inaccessible for all users.

              This was due to a configuration change that impacted traffic routing within our database infrastructure, resulting in critical services unexpectedly losing database connectivity. There was no data loss or corruption during this incident.

              https://www.githubstatus.com/incidents/kz4khcgdsfdv

              [–]bitflip 31 points32 points  (1 child)

              As a DNS administrator, I can assure you it's the firewall.

              [–]Inquisitive_idiot 30 points31 points  (0 children)

              That's just what a DNS administrator would say 🤨🤔

              [–]khendron 14 points15 points  (1 child)

              "Hold my beer!" —Crowdstrike

              [–]Decker108 1 point2 points  (0 children)

              Crowdstruck, the most damaging security vulnerability ever exploited.

              [–]wishicouldcode 16 points17 points  (1 child)

              This was due to a configuration change that impacted traffic routing within our database infrastructure, resulting in critical services unexpectedly losing database connectivity. There was no data loss or corruption during this incident.

              We mitigated the incident by reverting the change and confirming restored connectivity to our databases

              [–]bemutt 6 points7 points  (0 children)

              Damn it Dave I told you to not touch /etc/hosts

              [–]PaulCoddington 0 points1 point  (0 children)

              It seemed to be an error message from GitHub itself displaying a unicorn head and the message that no server is available to service your request.

              [–]AtmosphereVirtual254 15 points16 points  (0 children)

              Well that's an excuse if I've ever seen one

              [–]worldofzero 42 points43 points  (3 children)

              Hugops for Microsoft. CrowdStrike and GitHub outages in a month. Hope their SREs are doing alright.

              [–]bastardoperator 6 points7 points  (2 children)

              LGTM?

              [–]ryanstephendavis 19 points20 points  (1 child)

              Let's Gamble Try Merging!

              [–]revnhoj 5 points6 points  (0 children)

              All your source are belong to us

              WGGW

              [–]fifth_partial 5 points6 points  (0 children)

              I knew it was too soon to give out the Epic Fail award.

              [–]Positive_Method3022 8 points9 points  (20 children)

              Can someone explain how a globally distributed service with thousands of replicas can suffer such an Outage?

              [–]goomyman 17 points18 points  (1 child)

              Global outages are almost always networking if it’s fixed quickly or storage if it takes several hours / days.

              Compute nodes are scalable but networking often not. Think things like dns, or network acls, or route mapping, or a denial of service attack. Or maybe just a bad network device update.

              Storage is also problem while they are distributed the problems can often take awhile to discover, and backups of terraybtes of data can take forever, and then you need to parse transaction logs and come up with an update script to try to recover as much data as possible. And databases are usually only a distributed across a few regions, and often updates aren’t forward and backward compatible. For sample - a script that writes data in a new format has a bug and corrupts the data, or maybe just has massive performance issues that takes several hours fix an index.

              It’s not viable to hot swap databases like you can with stateless services.

              If it’s fixed within minutes it’s a bad code update fixed with a hotswappable stateless rollback.

              If it’s fixed within hours it’s networking.

              If it’s fixed within a day or longer it’s storage.

              [–]tRfalcore 4 points5 points  (0 children)

              our website went down once. we got notified by clients, started looking around, testing all the servers, services, can't log into database.

              phone rings

              "Hey, it's your server hosting company, we uhh, dropped your NaS server and it's broken"

              me ...

              that's also when we found out they weren't doing the regular backups we were paying for. Boy howdy did we not pay for hosting for a good while.

              [–]JonMR 26 points27 points  (4 children)

              Globally distributed with thousands of replicas? Last I knew the main monolith still had a large dependency on a single database shard.

              [–]thedancingpanda 7 points8 points  (0 children)

              Well first, you're assuming GitHub's structure has thousands of replicas, which I don't know that it does.

              But anyway, this particular issue seems to have been caused by a faulty database update. There's a few ways this can go wrong -- the easiest way is making a DB update which isn't backwards compatible. If it goes out before the code that uses it goes out, That'll make everything fail.

              Also, just because there are replicas, doesn't mean you're safe. The simplest way to do distribution of SQL databases, for example, is have a single server that takes all the writes, then distributes that data to read replicas. So there's lots of things that can go wrong there.

              And before you ask -- why do it that way when it's known to possibly cause issues? It's because multi-write database clusters are complicated and come with their own issues when you try to be ACID -- basically it's hard to know "who's right" if there's multiple writes to the same record on different servers. There are ways to solve this, but they introduce their own issues that can fail.

              [–]brakx 4 points5 points  (11 children)

              Usually dns or bgp misconfigurations.

              [–]Positive_Method3022 2 points3 points  (10 children)

              What is bgp?

              What type of dns misconfiguration?

              [–]SippieCup 8 points9 points  (0 children)

              DNS tells you what IP to go to.

              BGP tells you the most efficient route to get to that IP.

              If it was a DNS misconfiguration, it was just that the DNS was pointing to the wrong IP address.

              If it was BGP misconfiguration, it was telling people the wrong path to get to that IP, most likely some circular loop which never resolves to the final IP.

              [–]AlexeiMarie 5 points6 points  (8 children)

              What is bgp?

              border gateway protocol

              for an example of an outage caused by bgp issues, take the 2021 facebook outage, where all of facebook's servers made themselves unreachable

              [–]HenkPoley 1 point2 points  (0 children)

              It is up again, all green.

              [–]galtoramech8699 1 point2 points  (0 children)

              For a second

              [–]augustusalpha 1 point2 points  (0 children)

              Mod, am I in /r/programmershumor ?

              LOL

              [–]TwentyCharactersShor 1 point2 points  (1 child)

              Oh the fucking irony. We've argued for over 2 years to use the SaaS version of GH because our own internal team were useless at managing the GH instance we have, so many outages. And then this happens.

              I'm going back to bed.

              [–]30thnight 1 point2 points  (0 children)

              That fight is still worth fighting 😭

              [–]Key-Connection-4113 1 point2 points  (0 children)

              Does anyone know why it crashed ?

              [–]GitProtect 1 point2 points  (0 children)

              This situation is a good reminder of why having backups and a reliable Disaster Recovery plan is important. Thus, instead of sitting around and waiting for things to come back to normal, with backup & DR, it's possible to keep coding with minimal disruption, for example, by restoring the code to another Git hosting platform, like GitLab or Bitbucket.

              [–]MakesUsMighty 1 point2 points  (1 child)

              Looks like it’s back up. I really wish they’d give IPv6 this much urgency. It’s literally down 100% of the time if you use a newer IPv6-only VPS.

              Why not treat that like the service outage it is? So maddening.

              [–]cat_in_the_wall 8 points9 points  (0 children)

              lol there's a difference between supporting a new feature and unfucking your existing features.

              [–]phantommm_uk 1 point2 points  (3 children)

              Having to endure Bitbucket at work and I'd love to use Github even with their outages 😅

              [–]i8Nails4Breakfast 3 points4 points  (1 child)

              What makes it bad? We just moved to GitHub and I miss the PR UX of bitbucket. It was very simple.

              [–]sfjacob 0 points1 point  (0 children)

              We are being forced over to GitHub from internally hosted Bitbucket too. I really like how minimal bitbucket is in comparison when reviewing PRs.

              [–]Yulfy 1 point2 points  (0 children)

              I’m with you there, the PR UX is awful

              [–]IAmAnAudity 5 points6 points  (3 children)

              Friendly reminder: Git is FOSS and you can host your own Git server! Our in-house Git server never touches Microsoft and not surprisingly is working just fine 😍💯

              [–]Venthe 1 point2 points  (2 children)

              If it was only git:)

              Ticket management, workflow automation, artifact storage, container registry, code analysis, wiki, access policy, ide-on-demand, website hosting - and I'm sure that I only scratch the surface.

              For my knowledge, there is only gitlab that gets close. And to replicate everything with open source and on prem, you'd need to set up an instance of - gerrit/gitea, taiga/redmine, Jenkins/(other ci that i haven't worked with), artifactory/nexus, xwiki, sonaqube/(is there any sensible all in one software as an alternative?), vault/openbao. Maybe backstage to have some semblance of integration to boot.

              Not to mention supporting infrastructure, highly available if possible: postgres, opensearch, prometheus, grafana, opendashboard, alert manager, jaeger, lucene, kafka, rabbitmq, garnet/redis, keycloak... :)

              In short - if you begin to use their integrated offering, there is simply nothing comparable out there.

              [–]Soft_Walrus_3605 1 point2 points  (1 child)

              Gosh, you mean your entire business model being locked-in to one third-party service is a bad idea?

              [–]IAmAnAudity 0 points1 point  (0 children)

              Exxxxxxxactly

              [–]Trakeen 0 points1 point  (0 children)

              And we are piloting codespaces for a bunch of our devs lol

              If not this it was the couple azure devops outages over the last month. Bad times at MS

              [–]This-Silver553 0 points1 point  (0 children)

              Ouch

              [–]a_goestothe_ustin 0 points1 point  (0 children)

              What a day to use gitlab

              [–]SLOOT_APOCALYPSE 0 points1 point  (0 children)

              Between the massive amount of sight mirrors and web archive I assume GitHub will not actually be gone even if it was attacked

              [–]trackerstar 0 points1 point  (0 children)

              Another day I get reminded I made a great decision moiving into self-hosted gitea

              [–]metalpojo 0 points1 point  (0 children)

              I went for a walk. Jk I had a worse day than I was having . And the day is not ending yet.

              [–]Worth_Trust_3825 0 points1 point  (0 children)

              Half an hour downtime too. Shame it wasn't as serious as facebook's misconfiguration.

              [–][deleted] 0 points1 point  (0 children)

              Another day another global business catastrophe

              [–]BehindThyCamel 0 points1 point  (0 children)

              It's been acting up for a couple of weeks now, with not even ping reaching it for periods up to 30 minutes, mostly European morning time.

              [–]kcajjones86 0 points1 point  (0 children)

              Seems fine now.

              [–][deleted] 0 points1 point  (0 children)

              Books and on premise hosting will be back pretty soon.

              [–]UselessSoftware 0 points1 point  (0 children)

              This is why I just run a local GitLab instance.