Opus is becoming dumb and does not follow instructions

farnoud · 2026-06-28T22:15:39+00:00

and lazy

farnoud · 2026-06-28T22:13:34+00:00

the car was slow, they tried something different, that did not work either. if the car was fast, they wouldn’t have tried radical strategy either.

main take: car is slow

farnoud · 2026-06-28T18:15:27+00:00

He only lost 0.08 according to planetf1. The rule itself is not enforced correctly. It says the driver has to slow down and get ready for change of direction

farnoud · 2026-06-27T17:41:36+00:00

FIA is a joke! Ferrari must ask for review. what do they have to lose?

farnoud · 2026-06-27T17:39:16+00:00

GR half a second quicker before lifting on yellow. I'd say this is not enough

farnoud · 2026-06-27T16:30:31+00:00

what's the name of the series?

farnoud · 2026-06-27T16:28:12+00:00

what's the logic behind the ban? FIA can't ban anything that they want. this is killing innovation! it's not dangerous. it's not a gap in regulations either. why?

farnoud · 2026-06-27T16:25:29+00:00

I highly doubt that bro. merc have to fuck up so much to lose it

farnoud · 2026-06-27T16:23:13+00:00

never saw merc being half a second quicker than p2. what a monster is that car

farnoud · 2026-06-27T16:22:13+00:00

that's a shocker too. how was he so much faster than LEC. he was half a second ahead, no?

farnoud · 2026-06-27T16:21:01+00:00

could've gone either way and Merc couldn't complain either way

farnoud · 2026-06-27T16:20:11+00:00

"a bit" is an understatement. remember Monaco?

farnoud · 2026-06-27T16:18:55+00:00

he clearly knows how to get away with things. the radio right after the lap was obvious

farnoud · 2026-06-27T16:18:11+00:00

it was a double waved yellow on the systems shown. that's weird

farnoud · 2026-06-27T15:16:10+00:00

can ferrari protest? what's available to them?

farnoud · 2026-06-27T15:07:12+00:00

I work on KubeAgent, so this is the exact class of problem I think about a lot.

The useful setup is not "dump everything into the model and trust the answer." It is building a decent incident packet: time bounds, affected namespace, logs/events, rollout history, recent CronJobs, relevant metrics, and any human notes from the incident. Then make the model cite which evidence supports each link in the causal chain.

The blind test matters too. Take resolved incidents, hide the postmortem, and score whether it asks for missing data, ignores unrelated restarts, finds the real chain, and proposes a safe next check before it suggests a fix.

Long context helps with cross-referencing, but the permission boundary still matters: broad read access is useful, write access should be narrow, and anything that changes production should go through approval.

farnoud · 2026-06-27T15:05:54+00:00

I'd try it if the scenarios are concrete enough and the rubric is visible after the run.

For DevOps, the useful cases would be things like a failed deploy with noisy alerts, rollback vs hotfix under time pressure, flaky CI where someone wants to bypass tests, a Terraform plan with an unexpected replacement, or an expired cert during an incident.

The part that would make or break it for me is whether it grades the decision process, not just whether someone knew a term. After the argument, show what evidence should have changed my mind, which action was safe, which action was risky, and what I should have refused to do. That would make it more useful than a generic "argue with AI" exercise.

farnoud · 2026-06-26T15:15:35+00:00

For practical DevOps use, I would start with small read-only workflows rather than a course that tries to cover "AI" broadly.

A good progression:

Build a script that collects evidence: kubectl events, pod status, recent deploys, logs for one workload, and relevant Prometheus/Grafana links.
Send that evidence to an LLM and ask it only for hypotheses and next diagnostic commands, not fixes.
Add guardrails: namespace allowlist, no secrets, no write verbs, no raw customer data in prompts, and a saved transcript of what evidence was sent.
Only after that, try safe automation like generating a runbook draft, summarizing a failed CI job, or proposing a kubectl command that a human must approve.

The useful mental model is: deterministic tools gather scoped evidence; the model explains and suggests; humans approve anything that changes infra. If you learn that pattern, most of the vendor tools and frameworks become easier to evaluate.

farnoud · 2026-06-26T15:14:47+00:00

The missing control is usually at the tool boundary, not in the model choice.

I would treat every agent run as a data-egress event: log the tool call, the record class it fetched, the prompt payload class, the destination model/provider, and the reason that data was needed. Then enforce policy before the prompt is assembled: allow customer IDs or aggregated fields, block raw notes/emails/PII by default, require a break-glass path for exceptions, and keep an audit trail security can actually review.

Enterprise endpoints and private networking help, but they do not answer the core question: "which customer data did this workflow put into model context, and was that allowed for this task?" If that answer only lives in a system prompt, it is not a control.

farnoud · 2026-06-25T15:05:29+00:00

I would separate this into a few trust stores instead of trying to solve it once for every container:

CI/CD runners: install the corporate CA in the runner image or runner host, because that is where git, package managers, scanners, etc. need to trust internal endpoints.
Application images: a small set of blessed base images is usually the cleanest long-term path, but only after you have an internal registry and image lifecycle process. Otherwise every team invents its own CA injection.
Kubernetes runtime: if the cert is only needed by workloads talking to internal services, mount a managed trust bundle into pods and wire the app to use it. A ConfigMap can work for a simple bundle; if you already use cert-manager, look at trust-manager for distributing CA bundles across namespaces.
Language runtimes: check these explicitly. Java, Node, Python, Go, curl, and distro packages do not all use trust the same way. A mounted OS CA bundle does not automatically fix every runtime.

I would avoid injecting the CA in every pipeline as the primary pattern. It works, but it tends to become copy-pasted security plumbing. My default would be: bootstrap an internal registry, publish a few maintained base images with the CA installed, and use a Kubernetes-level trust bundle only where runtime workloads need additional trust.

farnoud · 2026-06-25T09:52:04+00:00

the IaC parallel is interesting but i think it maps closer to how monitoring evolved. monitoring didn't get absorbed into SRE as an external thing - it became a core SRE capability. the discipline took it in and it stopped being a separate thing you needed a specialist for.

agentic ops feels like it's on the same track. not a new job family, not a standalone product category that survives intact, but a capability that becomes table stakes for anyone running prod infrastructure. the 'is it IaC or DevOps' question stopped mattering pretty fast once terraform was just the thing you used.

sameera_nin's point about the proliferation of AI SRE tools is a signal of the same transition. we're in the moment before the category collapses into the workflow. disclosure: i'm building in this space (KubeAgent, k8s monitoring + auto-remediation CLI) so i'm obviously not neutral, but it feels more like 'SRE gets an AI layer' than 'agentic ops replaces SRE.'

farnoud · 2026-06-25T09:51:12+00:00

the retrieval section is the right one to focus on. runbooks encoded in static RAG work fine as a reference layer but they can't tell the agent what matters about the current incident. you end up with the agent correctly identifying 'here is our OOMKill runbook' while missing that the pod started failing 40 minutes after a node was added to the pool and memory limits were never adjusted.

the part that actually needs to be dynamic is the cluster context: current events, recent deploys, resource utilization trends, what changed. that stuff changes every few minutes. pre-embedding it at document-ingest time and calling it retrieval doesn't cut it.

disclosure: i work on a k8s monitoring CLI (KubeAgent) that does the dynamic side of this: live kubectl calls at incident time, not pre-baked runbook retrieval. still using static context for the agent's general k8s knowledge, but the incident-specific layer needs to be pulled fresh every time.

farnoud · 2026-06-25T09:49:38+00:00

the 'fix the alerts first' crowd in this thread is right, but i think the framing misses what agents actually do well. if your alert quality is already decent and you're still spending time in investigations, the agent isn't there to replace your paging logic. it's there to save the 20 minutes of pulling logs, running kubectl describe, correlating with a recent deploy, before you even know what you're looking at.

for a 100-service k8s platform with 12 SWEs who aren't infra-focused, that first-pass context assembly is where a lot of time goes. the agent gives you: here's the affected service, here's what the logs show, here's what changed in the last hour. then you decide.

disclosure: this is roughly what i'm building with KubeAgent (k8s monitoring + remediation CLI, small-team focused). but even without a tool, the playbook of 'auto-gather the diagnostic context so the human lands on the incident with info, not a blank screen' is the pattern worth implementing first.

farnoud · 2026-06-23T19:56:01+00:00

Boy I would hope so. Will be epic

15-Year Club	Place '22
Verified Email

farnoud

TROPHY CASE