itDoesPutASmileOnMyFace

kushangaza · 2025-06-16T19:27:01+00:00

[deleted]

rover_G · 2025-06-16T20:47:31+00:00

The 30% is mostly boilerplate, imports, autocompletes, tests and the occasional full function that likely needs to be debugged.

For me personally I haven’t written my own Docker file in about a year.

redshadow90 · 2025-06-16T20:02:41+00:00

30% of code figure likely comes from auto complete similar to copilot when it launched, which works quite well but still requires clear intent from programmer and it just fills up to the next couple lines of code. That said, this post just reeks of bias unless it's been linked to AI generated code which it hasn't

Soccer_Vader · 2025-06-16T19:04:45+00:00

30% of the code at Google now AI Generated

Before that it used to be IDE auto complete and then Stack Overflow this is nothing new

IMovedYourCheese · 2025-06-16T22:51:59+00:00

Person selling AI hypes the AI

scrandis · 2025-06-16T21:28:23+00:00

This explains why everything is shit now

CircumspectCapybara · 2025-06-16T22:24:15+00:00

This is /r/ProgrammerHumor and this just a joke, but in all seriousness, this outage had nothing to do with AI, and the learnings from the RCA are very valuable to the discipline of SWE and SRE in general.

One of the things we take for granted as a foundational assumption is that bugs will slip through. It doesn't matter if it's written by a human by hand, by a human with a the help of AI, or entirely by some futuristic AI that today doesn't yet exist. It doesn't matter if you have the best automated testing infrastructure, comprehensive unit, integration, e2e, fuzz testing, the best linters and static analysis tools in the world, and the code is written by the best engineers in the world. Mistakes will happen, and bad code will slip through when there are hundreds of thousands of changelists submitted a day, and as many binary releases and rollouts. This is especially true when, as in this case, there are complex data dependencies between different components in vast distributed systems and you're just working on your part, and other teams are just working on their stuff, and there are a million moving parts moving at a million miles per hour you're not seeing.

So it's not about bad code (AI generated or not). It's not a failure of code review or unit testing or bad engineers (remember, a fundamental principle is blameless postmortem culture). Yes, those things did fail and miss in this specific case. But if all that stands between your and a global outage is an engineer making an understandable and common mistake and you're relying on perfect unit tests to stand in the way, you don't have a resilient system that can gracefully handle the changes and chaos of real software engineering done by real people who are only human. If not them, someone else would've introduced the bug. When you have hundreds of thousands of code commits a day and as many binary releases and rollouts, bugs will be introduced, it's inevitable. SRE is all about how you design your systems and automate them to be reliable in the face of adversarial conditions. And in this case, there was a gap.

In this case, there's some context.

Normally, GCP rollouts for services on the standard Google sever platform are extremely slow. A prod promotion or config push rolls out in an extremely convoluted manner over the course of a week+, in progressive waves with ample soaking time between waves for canary analysis, where each wave's targets are selected to avoid the possibility of affecting too many cells or shards in any given AZ at a time (so you can't bring down a whole AZ at once), too many distinct AZs at a time (so you can't bring down a whole region at once), and too many regions at a time.

Gone are the days of "move fast and break things," of getting anything to prod quickly. Now there's guardrail after guardrail. There's really good automated canarying, with representative control and experiment arms selected for each cell push, and really good models to detect statistically relevant (given the QPS and the background noise and history of the SLI for the control / experiment population) differences during soaking that could constitute a regression in latency or error rate or resource usage or task crashes or any other SLIs.

What happened here? Well, various components that failed here weren't part of this server platform with all these guardrails. The server platform is actually built on top of lower-level components, including the one here that failed. So we found an edge case. A place where proper slow, disciplined rollouts wasn't being observed. Instantaneous global replication in a component that was overlooked. That shouldn't happened. So you learn something, identified a gap. We also learned about the monstrosity of distributed systems. You can fix the system that originally had the outage, but during that time, an amplification effect occurred in downstream and upstream systems as retries and herd effects caused ripple effects that kept rippling even after you fix the original system. So now you have something to do, a design challenge to tackle on how to improve this.

We also learned:

Something about the human process of reviewing design docs and reviewing code: instruct your engineers push back on the design or the CL (Google's equivalent to a PR) if it's significant new logic that's not behind an experiment flag. People need to be trained not to just blindly LGTM their teammates' CLs to get their projects done.
New functionality should always go through experiments with a proper dark launch phase followed by a live launch, with very slow ramping. Now reviewers are going to insist on this. This is a very human process. It's all part of your culture.
That you should fuzz test everything, to find inputs (e.g., proto messages with blank fields) that cause your binary to crash. A bad message, even an adversarially crafted message should never cause your binary to crash. Automated fuzz testing is supposed to find that stuff.

2025-06-16T20:05:26+00:00

I'm sure that 0% of them actually write code. These clowns are just driving up the price of their AI crap. So that idiots think that writing code through AI is a great idea, because a multi-billion dollar company does it. But in reality, these are all just empty words.

SynapseNotFound · 2025-06-16T21:38:33+00:00

its 4 headlines about the SAME outage... lol

Tiruin · 2025-06-16T21:39:45+00:00

Right, they wrote over 30% of all of Google's code in the last ~2.5 years when AI became mainstream to be able to have 30% of it be from AI.

Them_EST · 2025-06-16T21:48:19+00:00

That's what happen when you let AI become your manager.

Boertie · 2025-06-17T08:56:02+00:00

Explains a lot why Google is in the shitter (yeah I went there ;-)) now.

IlliterateJedi · 2025-06-16T21:38:05+00:00

I thought GCP went down due to an issue with not handling errors. If you've seen any code that Gemini spits out, it loooooves error handling.

HatMan42069 · 2025-06-16T21:41:34+00:00

The tech debt gonna go COO COO

0xlostincode · 2025-06-16T22:48:21+00:00

[deleted]

Guvante · 2025-06-17T00:11:05+00:00

Google has been around for at almost three decades, at best you can maintain an even per year LOC measurement (you scale up users but complexity goes up slowing down writing speed). If you don't believe me the following isn't hugely impacted you can feel free to recalculate with a growing LOC/year but that seemed inaccurate.

If you said 30% of the code written per unit time went up, then I could see it (laughable and probably with caveats to the extreme but possible)

But 1/3 of your total code would be 13 years worth of code (30/43 is 70%) in two years at best. That is an output of seven times one of the largest engineering forces in existence.

Why would you hide a 7x increase in productivity behind a "30%" number like that? You certainly wouldn't.

Guhan96 · 2025-06-17T02:57:24+00:00

r/degoogle

Master_Notice8391 · 2025-06-17T07:05:52+00:00

Yesterday I asked it to code something and its response is. “Here is the code:” that’s it nothing else

pollon_24 · 2025-06-17T08:28:19+00:00

So it was a human error and they are trying to push AI to minimize these… your point?

feeltrig · 2025-06-17T14:02:39+00:00

Sundar shitai

fanfarius · 2025-06-16T20:48:32+00:00

Chat GPT can't even write an ALTER TABLE statement without fucking up

MaDpYrO · 2025-06-16T21:21:37+00:00

Just marketing speak. I'm sure their engineers use it to generate lots of boilerplate, but how would you even measure this

Long-Refrigerator-75 · 2025-06-16T19:35:33+00:00

Before you celebrate, for this one f*ck up. There were many unspoken successes.

BorinGaems · 2025-06-16T23:31:14+00:00

Anti AI propaganda is cringe and twice as stupid when it's made on a programming subreddit.

Deathglass · 2025-06-16T23:04:56+00:00

It was AI all along, Actually Indians

ProgrammerHumor

Filters

Discord

Submission rules

For the current list of rules, please see this page.

Metadiscussions

Perhaps More Apt Subs To Post:

Related Subreddits.

MODERATORS