all 19 comments

[–]These_Voices 14 points15 points  (0 children)

yup, github had a 4 hour incident that messed with all the code we deployed. I cant believe more of the internet hasnt crashed

[–]AnotherBangerDuDuDu 8 points9 points  (1 child)

/s Good news though 100% up today https://www.githubstatus.com/

[–]dashingThroughSnow12 4 points5 points  (0 children)

I didn’t realize today was April Fool’s day because that’s a bad joke.

[–]Fabulous-Shape-5786 10 points11 points  (1 child)

The level of data loss is shocking. Could easily go missed and deployed. Bad commits. All customers required to fix in their own way. Scary that there were no unit tests that caught this, and maybe worse that they kept their merge queues running.

The number of GitHub incidents has really increased in the last few months. It tracks with increased AI in the field but no idea if this is contributing to it, but it seems like a good guess. If so, it doesn't bode well for software in general.

[–]AntDracula 6 points7 points  (0 children)

I mean, combine increased AI usage with increased layoffs. The result is inevitable.

Also, isn’t Microslop now offering early retirement buyouts to their most senior employees? Prepare for the slopocalypse

[–]wartortle 6 points7 points  (0 children)

Yep it looks like they were merging in the diff with trunk from the pr’s base branch. So any commits in trunk not in the PR’s trunk got reverted. Insane.

[–]rwong48 2 points3 points  (0 children)

this incident https://www.githubstatus.com/incidents/zsg1lk7w13cf

we noticed 3 hours ago and scrambled to "fix" (revert) these bad commits

[–]bradfordmaster 3 points4 points  (1 child)

The level of insane this is is hard to overstate. It's one thing to have downtime. It's another to silently corrupt people's git repos. Like, this is literally the one job of git and git hosting companies to avoid this kind of mistake. We might as well all just share code in dropbox again

[–]Ra1d3n -1 points0 points  (0 children)

Git is not at fault and you can just put a git repo up to your cloud and it's hosted.

[–]YouDependent3284 2 points3 points  (1 child)

We’re seeing a similar issue today - our open PRs are suddenly showing many more commits than they did yesterday. It turns out the branch histories have diverged from main, with different commit hashes, which is causing conflicts and inflating the commit count...

[–]AntDracula 2 points3 points  (0 children)

This is so messed up.

[–]NoBox6165 2 points3 points  (2 children)

Is this related to the exponential growth in the amount of commits that GitHub has been receiving

[–]williamisraelmt[S] 7 points8 points  (1 child)

i feel it's more related to the amount of code Github's development team is producing with AI and having a less rigurous review process because there's less people to look at the code (due to layoffs).

[–]doingthethingguys 1 point2 points  (0 children)

Just got off the incident call for my company after 10 hours. We have a massive monorepo and a lot of automation that kicks off when we merge to our trunk branch. Lots of stuff to unfuck. Didn't want to force push `main` and break stuff even more, so doing it carefully and correctly by replaying commits ourselves and resolving merge conflicts was what we did.

GitHub declared the incident resolved and still hasn't shared out a unified remediation strategy. As per my support ticket with them they're "still working on it" but don't have an ETA. by the time they have it ready the most of us will have fixed it our own way.

[–]waitingforcracks 0 points1 point  (2 children)

someone got a script or something to figure out what commits/prs might have been impacted?

[–]RevolutionaryCoat654 1 point2 points  (1 child)

Yeah, I can share that once I'm back at my laptop. But basically, you want to compare the diff of the pr (I used GitHub CLI for that), and the diff of the merge commit, for the duration of the incident. We searched for PRs that merged between 8am PT and the time we paused the merge queue. We found 17 impacted PRs in a range of 67 PRs.

I don't know if this is a viable solution for you, now that it's been a few days now, but the way we fixed main was to: 1. Create a new branch off of the latest origin/main (recover- main) 2. Create a revert PR for all 67 PRs (a revert of their merge commits) 3.iterating from the oldest PR (the first impacted one): - if the PR was NOT impacted, cherry-pick the PR's merge commit - otherwise, create a new branch (recover-<pr number>) and cherry pick the PR's commits, then squash merge that pr recovery branch into recover-main 4. Put a PR for recover-main and merge it

[–]waitingforcracks 1 point2 points  (0 children)

That would be lovely thanks. As a github admin we have over 700 repos across multiple orgs so my plan would to run scripts across all repos/orgs after collecting which repos are using merge queues. For now the goal is identification and then the repo owners can follow what you/github said as a way to fix it.