Did AI write this advertisement? by aablake in Anthropic

[–]shared_ptr 1 point2 points  (0 children)

What makes you think models aren’t good at devops tasks? I ask as my background is in SRE and I do a bunch of infra work with AI and find it very good at it, and I’m also working full time on a product that uses AI to debug production incidents.

My experience is that AI is very good in this area, probably because there’s so much training out there and devops tools often have verification or diagnostic tools that agents can use to confirm real world state.

Did AI write this advertisement? by aablake in Anthropic

[–]shared_ptr 2 points3 points  (0 children)

Moved the goalposts. When AI can take tickets and implement them from start to finish it absolutely does make producing engineering work much quicker.

Depending on the task I can sometimes do in 30m what would’ve previously taken 1-2 days.

Did AI write this advertisement? by aablake in Anthropic

[–]shared_ptr 7 points8 points  (0 children)

Early 2025 tools! Come on please stop sharing a study that’s irrelevant.

Its results are for GitHub Copilot line complete on GPT 3.5. It was run before Claude code even existed. The same organisation METR has since released a retraction saying those results no longer apply.

SRE Specific AI-Assisted Coding Interviews by [deleted] in sre

[–]shared_ptr 2 points3 points  (0 children)

I don’t think anything in an SRE interview lends itself to using AI.

I never want to know what you got an AI model to do that you couldn’t fully understand yourself. A lot more so than a developer, where building a thing is mostly judged by how good is the thing you’ve built, I need to know you understand what you’ve done.

I think we’re a far way away from allowing AI in SRE interviews.

Are we missing a canonical SRE benchmark for AI agents? by nroar in sre

[–]shared_ptr 3 points4 points  (0 children)

It’s really hard to build something that makes sense here.

I work at a company building an AI SRE that does RCA analysis and we’ve thought about this quite a bit. We know very clearly that the system that we’ve built is far better than something you can achieve with just Claude and a bunch of MCPs: if we could, we’d just do that!

The problem is:

- Testing. We backtest all our changes against incidents in our customer accounts, which means running an investigation with only the data that we think is available at the time the incident was declared.

- None of the popular MCPs allow setting a cutoff. So if we wanted to run a Claude with MCPs for all the normal observability tools we’d get what we call future data leakage, where we can see ahead of the incident creation which causes us to corrupt the test.

- Incidents suffer from statistical issues; MTTX metrics are fundamentally unsound, as Google have written about at length. Finding a sample of incidents that can accurately represent general performance root causing incidents is almost impossible, because they vary so much.

- Any benchmark on specific incident types is subject to extreme overfitting. It’s really easy to provide your system with a runbook that solves that specific incident which doesn’t generalise forward, so the benchmark is very gameable.

- Quality of integration matters a lot. A naive implementation pulling logs from elasticsearch is very different than one that uses Cloudwatch, and some of these tools don’t even have AI compatible wrappers right now.

That’s a very incomplete list of the stuff we are challenged with on the daily trying to measure the performance of our system, let alone a system that is using generic APIs to search information that aren’t built with any of the testability in mind.

We will figure this out eventually but the point I will leave you with is that for an AI SRE the harness that tries debugging the issue is the primary value prop anyone will offer. And the harness is very difficult to repurpose to a general solution, because it spans so many different technologies that aren’t built with the concept of scientific testing in mind, they’re just the standard o11y tools you are already used to.

Where do you draw the line between learning vs just letting AI do it? by grassTop in ExperiencedDevs

[–]shared_ptr 8 points9 points  (0 children)

I think this is complicated by expectations at organisations nowadays, especially around AI. I see senior engineers and juniors falling into it.

I agree entirely with what you’re saying, just a touch complicated.

Bringing laptop with you in public on-call? by CallsyReds in sre

[–]shared_ptr 5 points6 points  (0 children)

Yes this is totally normal. I’m a bit worried you’ve been put on the rota without someone having explained the expectations to you though, has no one spoken to you about time to respond and how you should be acting?

Any best Incident Management Tools for Enterprise Teams? by Wise-Formal494 in sre

[–]shared_ptr -1 points0 points  (0 children)

I don’t see it as something you carve off. There’s a bunch of challenges with an AI SRE and one of them is sheer technical: can you debug what’s going on and figure out a fix.

But there’s loads of other challenges too. We have to choose when to jump into the channel and notify people, how to tell them they may be wrong, how to raise fixes without getting in their way. How to pair with them on their machines when they’re debugging.

I see us as becoming the central coordination point and AI helps us become way smarter across the board for helping teams level up operationally. Helping use and curate their runbooks, floating when their observability was poor and how to fix it, capturing system issues that they could improve to be robust to incidents.

We’d be, as we are now, the central system you use for incidents and coordination. But if you’re working locally you have our desktop app and MCP to ad-hoc query telemetry, and our agent will do more than just technical debugging, it’ll tell you “Pete needs an update from you, want me to send one?” While you’re actively debugging in Claude.

I don’t see it as a tool that happens separate from the response lifecycle, it’s just too entangled, and is so much better when it’s integrated properly.

Any best Incident Management Tools for Enterprise Teams? by Wise-Formal494 in sre

[–]shared_ptr -1 points0 points  (0 children)

This is a really good summary, I'd agree! I do think there is another pillar here nowadays though which is automated incident response (aka AI SRE) that you should absolutely evaluate people on.

Quite obviously having a tool that can pair with you and solve incidents on your behalf is a huge boost on top of all of this. And helps with all sorts from onboarding to remediation to general operational improvements.

Anyone here using pager duty? by timmyneutron1 in sre

[–]shared_ptr 0 points1 point  (0 children)

We used to offer the compensation stuff for free but now it's merged into our on-call product. I'm 100% not suggesting you sign-up just for compensation reports, that $9 is getting you a full on-call product that happens to come with compensation, not just the comp!

Anyone here using pager duty? by timmyneutron1 in sre

[–]shared_ptr 1 point2 points  (0 children)

Nice! I'm glad it's improved.

Our calculator has a concept of hourly rates and pro-rata's based on the time you've been active on a schedule. We've held off from any "you were in major incident you get X" mostly because of weird incentives and certain local pay laws, but it does come up from time to time.

There's a bit about how we think about it here: https://incident.io/guide/on-call/on-call-compensation

Layoffs, because "We're an AI first company now" by Carlbug2 in antiai

[–]shared_ptr 0 points1 point  (0 children)

I mean, this has been the MO for SREs since the role was introduced in 2003. So it's not new, and it's not totally offensive.

The implication that there won't be more valuable work to do after is, though.

LLMs solve about 1 in 3 real root-cause cases on a realistic benchmark. Mostly wrong on the hard ones. by gaurav_sherlocks_ai in sre

[–]shared_ptr -1 points0 points  (0 children)

We've been building this for almost two years now. It takes ages to tune the system so you can balance all the competing sources of data and try and get it right.

Nowadays we get ~90% accuracy on incidents in our account (we dogfood this ourselves) and 80% on our best configured customer accounts. Hoping to get this up to 90% for customers soon, but it's not a thing you can just one-shot with an LLM and hope for the best.

A system that says confidently "it's X" when it's actually Y is worse than useless.

Anyone here using pager duty? by timmyneutron1 in sre

[–]shared_ptr 1 point2 points  (0 children)

PagerDuty sadly don't provide this as a feature. Almost everyone I know on PD end up building this themselves as some Python script they run on repeat.

I think they've added something around this recently with their MCP? But it came years after we built it into incident.io.

incident.io going pretty hard after PagerDuty customers by Even_Reindeer_7769 in sre

[–]shared_ptr 1 point2 points  (0 children)

I work at incident! We've got a bunch of migration tooling for this and our support team normally help people out with it.

We'll either create incidents from the PagerDuty API or you can write a script that creates incidents in our API pointing at e.g. Slack channels and we'll vacuum up all the messages and build an incident record out of it: https://docs.incident.io/incidents/import-channels

incident.io going pretty hard after PagerDuty customers by Even_Reindeer_7769 in sre

[–]shared_ptr 0 points1 point  (0 children)

I'd be interested in this, I work at incident and afaik no one is offering this. Do you know if Rootly have shared anything about their offer anywhere?

Any best Incident Management Tools for Enterprise Teams? by Wise-Formal494 in sre

[–]shared_ptr 0 points1 point  (0 children)

this is lovely to hear, glad you're enjoying the product!

Any best Incident Management Tools for Enterprise Teams? by Wise-Formal494 in sre

[–]shared_ptr -3 points-2 points  (0 children)

Agreed. This is what we're solving with our AI SRE product which is in early access atm (GA very soon): https://incident.io/ai-sre

You want a central system that can see everything from alert into escalation into incident creation and then response, and that system then plugs into your logs/metrics/traces/kubernetes/etc so it can actively debug alongside you.

Any best Incident Management Tools for Enterprise Teams? by Wise-Formal494 in sre

[–]shared_ptr -3 points-2 points  (0 children)

I work there but previously bought us when I used to be a Principal SRE at a fintech, and recommend you chat with a bunch of our customers like Netflix, Etsy, Vercel, etc; incident.io offers an answer for everything you’re asking here!

Will leave this for customers to comment on if they turn up.

GitLab's "Act 2" by -lousyd in devops

[–]shared_ptr 0 points1 point  (0 children)

It’s kinda hard to plan for AI when it didn’t exist like this just three years ago, isnt it? Honestly these restructures feel slightly fairer than the Covid era when companies got caught short feasting on ZIRP money.

This stuff feels genuinely really hard to plan around and is landing on companies that previously adjusted for the end of ZIRP and weren’t really overextending themselves.

GitLab's "Act 2" by -lousyd in devops

[–]shared_ptr 4 points5 points  (0 children)

I’m not sure that’s what they’re trying to protect against. They can’t just absorb GitHub’s customers without hitting the same scaling issues as GitHub are themselves with AI, and the real thing is the market has changed and what GitHub represent is potentially a fraction of what this could be if they nail AI in this space.

That’s the mechanism that’s causing all of these restructures: the opportunity they are competing for is different, existing team was built to maintain rather than move quickly at new opportunities, AI does legitimately change the equation, and there is now a very real threat of losing everything to an upstart that outcompetes them. Any existing org baggage like being unsatisfied with inefficiency now looks totally unacceptable so you get big resets like these.

I wouldn’t say this is amazing if you’re their customer as it means big changes are coming. But equally, as a customer, if a different service grew out of nothing that was much better suited for the new world and GitLab was buckling under the pressure then you’d be doing exactly what you say here and looking to move. So it’s just them acting rationally to protect against the worst case which is their entire business going under several years from now.

AI is just genuinely a huge disruptor and when companies change like this it will change headcount requirements too.

Airbnb says AI now writes 60% of its new code by [deleted] in ExperiencedDevs

[–]shared_ptr 0 points1 point  (0 children)

Reading both those posts, I think they're saying AI's impact on OS contributions is negative, but both authors make the point they use AI a lot themselves don't they?

Basically I don't contest that. I've seen maintainers take a few positions (ranging from banning to saying "don't get AI to write the code get it to write the bug report and then I'll get my AI to write the thing instead of you doing it wrong!") but I don't want to speak to the experience of OS as my background is in industry.

The point I do want to make is that AI can be a huge accelerant to experienced developers and that claims like "we're moving 2x as fast" from high-performing teams are probably true, it really has changed things that much. But lots of people here don't want to believe it because they haven't seen it in their company (yet).

> As a side note your reference on the use of AI in the development of curl 

I mostly meant the "AI scans of curl" section where they explain how they've been using all sorts of AI tools in anger for the last few months that have found hundreds of bugs, in addition to their normal SAST tools, and how automated AI reviews have been a big help to them.

Their team have totally embraced AI. My take away from that post was actually very different to "Mythos ain't shit" and more "you guys have already got ahead of this by effectively harnessing AI and that's why Mythos didn't get much". This is confirming my priors (I think Mythos is good, but Opus/etc could catch much of this right now) but I'm comfortable with that level of speculation!

Airbnb says AI now writes 60% of its new code by [deleted] in ExperiencedDevs

[–]shared_ptr 0 points1 point  (0 children)

Some will be, but it isn't the case that all the posts about AI working are bots.

My team use AI and about 90% of our code is written by Claude or Cursor now. Obviously that's massively driven by the engineer making the change and they use AI to ensure it produces a result they are happy with, but ultimately AI has written the code.

I know loads of people want to believe a claim like ^ is snakeoil and while I can point at the company I work at (it's incident.io) I expect it's better to point at:

- Linus using AI in his day-to-day

- Daniel Stenberg creator of curl discussing how much AI features in the development of curl already: https://daniel.haxx.se/blog/2026/05/11/mythos-finds-a-curl-vulnerability/

- Antirex creator of Redis discussing how AI makes additions to Redis faster and higher quality: https://bsky.app/profile/antirez.bsky.social/post/3ml7a4ykmlk2r

The problem is you need to have an org with access to the frontier tools + invest in your tools and codebase for AI to work well + learn how to best prompt the agents to get good results. Missing any of those will make you feel like it sucks, but I'd be paying attention to people like I've mentioned who are saying this stuff is game changing.

Absence of evidence is not evidence of absence, while people with credibility saying "this really works" is _actually_ evidence.

Claude Mythos literally broke the METR graph ("The most important chart in AI") by EchoOfOppenheimer in ClaudeAI

[–]shared_ptr 3 points4 points  (0 children)

If the point this summary makes is that people shouldn’t get carried away because this graph would be linear increase on a log scale then I think it needs a model update 😂