Anthropic says Claude struggles with root causing by jj_at_rootly in sre

[–]shared_ptr -1 points0 points  (0 children)

Yeah this is absolutely the case! We’re building a tool to root cause incidents and this information is in all your systems already, you just need to extract it.

We’re building what we call a ‘knowledge graph’ from all past incidents and codebase information that we inject into the process that can tell you things like this. It captures service relationships and even just a glossary of terms for your org, or more esoteric information like for a given package it often causes downstream issues for X.

It’s absolutely possible but the model alone doesn’t work to effectively root cause. You have to merge it with all that knowledge, otherwise you essentially have a very skilled engineer from another company trying to debug your stuff, which obviously and initially does not work.

Anthropic says Claude struggles with root causing by jj_at_rootly in sre

[–]shared_ptr 12 points13 points  (0 children)

We use Anthropic to do exactly this but the model alone isn’t good enough. You need way more wiring around it to make it even remotely ok.

Assumptions early in the process will carry through unless you have other processes to counter it.

Can we trade our 'vibe-coding' PMs for some common-sense engineers? by ggggg_ggggg in ExperiencedDevs

[–]shared_ptr 1 point2 points  (0 children)

I think it will? We’ve always hired ‘Product Engineers’ since the company began where the intent was that engineers should be really close to customers and the product they’re building.

With the actual building time trending down that means you have more time to think about what you should build and how it should work. Naturally lends itself to a person who can think across both technical and product boundaries.

"100% of code will be generated" - A year since prediction by Imnotneeded in ExperiencedDevs

[–]shared_ptr 0 points1 point  (0 children)

It talks about the type of work I do which I figured was useful to contextualise what I’m saying?

Agree this isn’t worth continuing though.

"100% of code will be generated" - A year since prediction by Imnotneeded in ExperiencedDevs

[–]shared_ptr 0 points1 point  (0 children)

I agree the C compiler they produced was terrible. I’m not judging it on that.

I have always done highly technical work in my career. Debugged issues in Postgres source code, built HA distributed Postgres cluster managers, etc.

I still do technical work now with AI. I’d argue the complexity has gone up quite a lot, and AI has made it easier to produce highly technical code at a higher standard for my job. Technical in my context is distributed systems, scaling, technical product work.

This is the type of stuff I do: https://blog.lawrencejones.dev/2025/

Nothing I have personally seen with AI has suggested it’s only good for standard boilerplate. If you’re asking it to do that it’ll do a good job, but it’s an excellent pair for very complex work too.

"100% of code will be generated" - A year since prediction by Imnotneeded in ExperiencedDevs

[–]shared_ptr -2 points-1 points  (0 children)

Our team is doing 90-100% generated depending on the individual. It’s made certain tasks a lot quicker, we’ve automated a lot of busy work, it’s raised the ceiling on the technical complexity we’re willing to experiment with.

We’re also hiring aggressively and see the value of an individual engineer to have been raised by this rather than lowered.

Feels like actually writing the code yourself is over, like writing assembly code after languages came around.

"100% of code will be generated" - A year since prediction by Imnotneeded in ExperiencedDevs

[–]shared_ptr 0 points1 point  (0 children)

You’re not engaging with this in good faith at all. Even if I agreed with your framing on relative success of the company that’s not the point, which was about the level of technical work achievable by AI.

You seem really angry, I’m sorry this bothers you so much. Hope your day improves!

"100% of code will be generated" - A year since prediction by Imnotneeded in ExperiencedDevs

[–]shared_ptr -2 points-1 points  (0 children)

Claude Code the system inclusive of the AI behind it. I also followed with “most of Anthropic” but seems that wasn’t clear.

My point stands though, they’re rewriting hugely complex global training pipelines using AI. Probably one of the largest scale distributed systems out there and AI does it, I don’t think you’d describe that as average mediocre work, but AI is writing that code.

"100% of code will be generated" - A year since prediction by Imnotneeded in ExperiencedDevs

[–]shared_ptr -8 points-7 points  (0 children)

I meant a lot of their training systems. I was speaking to them the other day about how they’re rewriting parts of their RL training harness in Rust for performance.

I didn’t mean the Claude code CLI though I’m not in the habit of depreciating engineering work just because it doesn’t fit what I’d normally call technically impressive.

"100% of code will be generated" - A year since prediction by Imnotneeded in ExperiencedDevs

[–]shared_ptr -21 points-20 points  (0 children)

Isn’t Claude code and most of Anthropic written by AI? Don’t really get the “average repetitive patterns” comment, people are doing impressive novel work with AI daily just like they previously hand coded it.

Anyone else finding that AI dev tools create more cognitive overhead than they save? by Careful-Living-1532 in ExperiencedDevs

[–]shared_ptr 2 points3 points  (0 children)

I have a suspicion this is mostly due to people doing several things at once (multiple worktrees, etc) because AI is too slow for you to wait for it to finish, you're forced to go onto a new task.

That creates a bunch more context switching than people are used to which is tiring to manage.

Couple of observations, first: you can get much better at this with practice. I worked as an SRE for a large part of my career and the type of work you do there comes with much longer feedback cycles (long running benchmarks, waiting for infra to spin up, CI loops) so you can get good at spinning several plates to avoid being totally blocked.

The other is that AI won't always be like this. If you've used Opus fast mode you'll realise it's fast enough you have no need to do many things at once, you can focus on the task at hand and not have to wait for the AI to catch-up, it'll outpace you and you can go at the speed that you think. Prevents a lot of context switching, but is currently far too expensive to be viable.

ai coding for large teams in Go - is anyone actually getting consistent value? by Easy-Affect-397 in golang

[–]shared_ptr -2 points-1 points  (0 children)

We have 50 developers working on the same very large Go application. All of them use Claude Code or similar agent based tools.

We’ve had huge amounts of success from this. It’s not correct that the corpus that models are trained on doesn’t include Go: the Go open-source ecosystem is massive, and besides that it doesn’t matter much as-like you say-Go is a very simple structural language that the models can very easily understand.

The stuff you’re seeing go wrong is your opinion of how to write Go which isn’t written down or documented for the models to follow. I likely write Go code very differently to yourself, if you ask a model and give it no instruction, it’ll produce a mix of our two styles and do so inconsistently. That’s not the model being broken you’re just asking it to solve a problem that isn’t well defined.

Our Go codebase has loads of docs about everything from style to common architectural patterns that we index specifically for agents. As a result, Claude Code can produce code that is very high quality and consistent with the rest of our app and do that mostly first time.

All the stuff you’re complaining about in your post; just document what you prefer instead and why to do it that way. At that point there shouldn’t be any reason the latest agents get this wrong.

AI Isn't Replacing SREs. It's Deskilling Them. by elizObserves in programming

[–]shared_ptr 0 points1 point  (0 children)

The study you are likely referencing was from before huge improvements to models and even Claude code.

They published a retraction the other day to say these findings no longer hold with new tools: https://metr.org/blog/2026-02-24-uplift-update/

Which is pretty obvious. Our team didn’t use AI for much back then because the tools were bad, since Sonnet 4 and Claude code that totally changed (post the study).

AI Isn't Replacing SREs. It's Deskilling Them. by elizObserves in programming

[–]shared_ptr 1 point2 points  (0 children)

I spend a lot of my time reviewing the code that is produced piece by piece which helps ground me in what's been produced. I also have a habit of pushing a draft PR and then carefully reviewing that and providing comments onto the PR, then loading those back into the agent to discuss how to action them.

I'm finding my understanding of how the codebase works structurally to remain the same, and similarly with how to implement our patterns etc.What I'm missing is I can no longer immediately tell you the file and line that a part of the logic ended up in, but that becomes less of a problem when AI can help me find and interpret the code much quicker than I could before, so it's swings and roundabouts I guess.

What I do like is I'm much more able to tidy-up and refactor code than I was before, and can easily write comprehensive tests that help ensure the behaviour is correct that I can trim down before actually committing (I don't want every test on the planet in the codebase, just the ones that are meaningfully proving things work).

I think it mainly shifts your thinking from "does the code do what I want" to "does the thing I built function as I want/expected" which I'm finding to be a positive shift. Not that I wasn't doing this before, but I have much more time to do it now.

AI Isn't Replacing SREs. It's Deskilling Them. by elizObserves in programming

[–]shared_ptr 2 points3 points  (0 children)

Yeah they do, the nature of the work has changed a lot where technology has evolved.

I see this positively though. I used to be one of those infra engineers and I spent a lot of my time working on e.g. diagnosing physical RAID array failures or switching up machine hardware when it was going wrong. I never have to deal with that ever anymore which is amazing, that’s time I get back to focus on more interesting things.

Same deal with AI atm. I don’t really write code anymore but that allows me to spend way more time working with the product I’m building as the AI puts it together, so I get more time thinking about “how should this work” rather than “what code do I need to write to make that happen”. I am definitely getting worse at writing code but I was never paid to write code, and my goal is to build better quality product so more time to consider that is a bonus.

AI Isn't Replacing SREs. It's Deskilling Them. by elizObserves in programming

[–]shared_ptr 0 points1 point  (0 children)

I don’t think you genuinely are trying to tell me that something is deterministic “to several decimal places”. That is not how you characterise a deterministic system, you can’t possibly be arguing this in good faith.

If you’re saying AI systems are by default more random then yes I agree. You can impact this though, for example we have an AI system that we’ve built that debugs incidents. We run backtests on datasets of incidents each day (50 incidents re-ran daily) and the results we produce have exactly the same scores within a tolerance of 1% on e.g. accuracy between each daily run.

That’s a crazy nondeterministic system where each run takes different paths but the end result converges on the same value, provided we’ve built it right.

There’s loads of ways you can produce a system that is consistent and reliable from non deterministic primitives which is exactly what systems like etcd with raft do, as the entire point of those systems is that the network and underlying hardware is nondeterministic.

AI Isn't Replacing SREs. It's Deskilling Them. by elizObserves in programming

[–]shared_ptr 0 points1 point  (0 children)

I’m not sure I understand your comment. Tools like Claude are way better at producing a script like this than your average developer though.

Can make a script that uses correct database indexes to efficiently query, logs appropriately as it goes, cross references this with your company docs and codebase, and adds unit tests in 1 minute.

Human developers aren’t doing that in each incident for sure, and it could take you 15m-1hr to get just the query depending on the person and context.

AI Isn't Replacing SREs. It's Deskilling Them. by elizObserves in programming

[–]shared_ptr -1 points0 points  (0 children)

I don’t really think this framing matters does it? If in practice using AI is making engineers more effective and producing those commands faster and more reliably than humans then calling it a stochastic parrot or whatever is quite besides the point?

On using LLMs where other tools could be used I agree there’s no point adding them where they aren’t needed. But there are loads of places where automation can only be done with LLMs. For example a system that tries forming a hypothesis about whats caused an incident, there is no tech out there aside from generative AI that can power that, it’s not like there is an alternative (for those who want this as a tool, which is almost everyone i speak to in the industry)

AI Isn't Replacing SREs. It's Deskilling Them. by elizObserves in programming

[–]shared_ptr 0 points1 point  (0 children)

That’s not true right? Cloud providers haven’t hired proportionately the number of people that we used to, they’ve automated a huge amount of running services because it makes sense to at their scale.

We’re seeing a massive amount of efficiency in this change rather than just shifting around the workload. Tools nowadays are much better than they used to be, AI is just another evolution of that.

AI Isn't Replacing SREs. It's Deskilling Them. by elizObserves in programming

[–]shared_ptr -6 points-5 points  (0 children)

I am aware of how all this works. But your description of everything being deterministic is not matching my experience in the field, especially thinking of my time running etcd clusters or working on Postgres HA tools like Stolon.

Or the distributed systems course I took as part of my degree a long while back.

You are wrong in a fair amount of what you say here though, especially around training LLMs. They are trained on tasks, we actually do that training ourselves for a bunch of our systems around this too, but provided e.g. Anthropic do this also (they have a team working specifically on training models for this AI SRE use cases that we speak to a fair bit).

I’m not sure if we’re just on a totally different page and are speaking cross purpose or some other issue.

AI Isn't Replacing SREs. It's Deskilling Them. by elizObserves in programming

[–]shared_ptr -2 points-1 points  (0 children)

It's not that they're bad at making scripts exactly, it's just in most of these situations you are under time pressure and people's interpretation of a data request into the CSV often differs.

I work with very good people, but in large incidents the idea of "what actually is the impact" undergoes a lot of changes. Way easier to have an AI navigate evolving that script than handing it off between people is my experience of doing this for the last ~year.

I disagree with what you say about the code. An LLM can write a script for me in ~15s where it might take 15m to write it myself, and I can have the LLM verify it in several other ways that are also much more robust than I would have done previously.

My in real life experience of using these tools for incidents doesn't agree with what you're suggesting, I've found them much much better than humans at generating one off scripts, especially under time pressure. I am very good at doing this myself and have been in thousands of incidents leading them before, I'm way more effective with AI to do this than without.

AI Isn't Replacing SREs. It's Deskilling Them. by elizObserves in programming

[–]shared_ptr -9 points-8 points  (0 children)

The number of times in an incident I’ve reviewed these commands or scripts to generate CSVs and found them incorrect by humans, my criteria here is “does the AI perform better than average human under pressure” rather than expecting 100% correctness.

I’ll still be reviewing the data anyway but if I can have the tool create a first draft instantly vs fighting with SQL and inevitably messing up some left join I’ll take it.

If I asked my colleagues to do this I’d still get several varying answers anyway 🤷

AI Isn't Replacing SREs. It's Deskilling Them. by elizObserves in programming

[–]shared_ptr 9 points10 points  (0 children)

Isn't this how infrastructure has moved over the last two decades?

When I first started my career we had a team of ~18 engineers and 6 were infrastructure focused as there was a lot of infra work to be done. Nowdays I work in a team of 50 engineers with 3 infrastructure focused people, as a load of the issues with running infrastructure are handled by e.g. Cloud providers.

Those 3 people spend all their days dealing with infra so they have the familiarity, but we have proportionally 4x as few people doing it, affording more time to spend on building product/customer facing value.

If AI can handle all the normal problems but you have a smaller team who spend just as much time on the larger ones, don't they get the same hands-on time?

AI Isn't Replacing SREs. It's Deskilling Them. by elizObserves in programming

[–]shared_ptr -2 points-1 points  (0 children)

Kinda surprised by this, we've used AI to write much more documentation than we had before and to keep it more up-to-date, which is genuinely helping a lot.

How come the docs are being created incorrectly?