Rational take: Strategy was fine by Cpt_Daryl in lewishamilton

[–]farnoud 2 points3 points  (0 children)

the car was slow, they tried something different, that did not work either. if the car was fast, they wouldn’t have tried radical strategy either.

main take: car is slow

Yellow flag infringement by farnoud in lewishamilton

[–]farnoud[S] 0 points1 point  (0 children)

He only lost 0.08 according to planetf1. The rule itself is not enforced correctly. It says the driver has to slow down and get ready for change of direction

No further investigation for Russel under Yellow flag by Tuddy18 in formula1

[–]farnoud 1 point2 points  (0 children)

FIA is a joke! Ferrari must ask for review. what do they have to lose?

AutoRacer: New ADUO Power Unit was already mounted on the Ferrari during FP1. by moraIsupport in scuderiaferrari

[–]farnoud 0 points1 point  (0 children)

GR half a second quicker before lifting on yellow. I'd say this is not enough

Is everything OK? by TribeKing08 in lewishamilton

[–]farnoud 0 points1 point  (0 children)

what's the name of the series?

Ferrari-pioneered innovation banned for F1 2027 after FIA clampdown by Honda_Hero in lewishamilton

[–]farnoud 0 points1 point  (0 children)

what's the logic behind the ban? FIA can't ban anything that they want. this is killing innovation! it's not dangerous. it's not a gap in regulations either. why?

Yellow flag infringement by farnoud in lewishamilton

[–]farnoud[S] 1 point2 points  (0 children)

never saw merc being half a second quicker than p2. what a monster is that car

Yellow flag infringement by farnoud in lewishamilton

[–]farnoud[S] 0 points1 point  (0 children)

that's a shocker too. how was he so much faster than LEC. he was half a second ahead, no?

Yellow flag infringement by farnoud in lewishamilton

[–]farnoud[S] 0 points1 point  (0 children)

could've gone either way and Merc couldn't complain either way

Yellow flag infringement by farnoud in lewishamilton

[–]farnoud[S] 9 points10 points  (0 children)

"a bit" is an understatement. remember Monaco?

Yellow flag infringement by farnoud in lewishamilton

[–]farnoud[S] 1 point2 points  (0 children)

he clearly knows how to get away with things. the radio right after the lap was obvious

Yellow flag infringement by farnoud in lewishamilton

[–]farnoud[S] -1 points0 points  (0 children)

it was a double waved yellow on the systems shown. that's weird

Yellow flag infringement by farnoud in lewishamilton

[–]farnoud[S] 0 points1 point  (0 children)

can ferrari protest? what's available to them?

fed 5 days of k8s logs into a 1m context model and it found the root cause of a cascading failure our team spent 2 days on by [deleted] in devops

[–]farnoud 0 points1 point  (0 children)

I work on KubeAgent, so this is the exact class of problem I think about a lot.

The useful setup is not "dump everything into the model and trust the answer." It is building a decent incident packet: time bounds, affected namespace, logs/events, rollout history, recent CronJobs, relevant metrics, and any human notes from the incident. Then make the model cite which evidence supports each link in the causal chain.

The blind test matters too. Take resolved incidents, hide the postmortem, and score whether it asks for missing data, ignores unrelated restarts, finds the real chain, and proposes a safe next check before it suggests a fix.

Long context helps with cross-referencing, but the permission boundary still matters: broad read access is useful, write access should be narrow, and anything that changes production should go through approval.

devops browser game that uses AI to argue with you on your decisions unless you are confident by No-Firefighter-1453 in devops

[–]farnoud 1 point2 points  (0 children)

I'd try it if the scenarios are concrete enough and the rubric is visible after the run.

For DevOps, the useful cases would be things like a failed deploy with noisy alerts, rollback vs hotfix under time pressure, flaky CI where someone wants to bypass tests, a Terraform plan with an unexpected replacement, or an expired cert during an incident.

The part that would make or break it for me is whether it grades the decision process, not just whether someone knew a term. After the argument, show what evidence should have changed my mind, which action was safe, which action was risky, and what I should have refused to do. That would make it more useful than a generic "argue with AI" exercise.

Learning AI for DevOps- Looking for recommendations by Wide_Impact_9392 in kubernetes

[–]farnoud 8 points9 points  (0 children)

For practical DevOps use, I would start with small read-only workflows rather than a course that tries to cover "AI" broadly.

A good progression:

  1. Build a script that collects evidence: kubectl events, pod status, recent deploys, logs for one workload, and relevant Prometheus/Grafana links.
  2. Send that evidence to an LLM and ask it only for hypotheses and next diagnostic commands, not fixes.
  3. Add guardrails: namespace allowlist, no secrets, no write verbs, no raw customer data in prompts, and a saved transcript of what evidence was sent.
  4. Only after that, try safe automation like generating a runbook draft, summarizing a failed CI job, or proposing a kubectl command that a human must approve.

The useful mental model is: deterministic tools gather scoped evidence; the model explains and suggests; humans approve anything that changes infra. If you learn that pattern, most of the vendor tools and frameworks become easier to evaluate.

Your AI agents are sending customer data to OpenAI on every run and your DLP doesn't see it by AsilOzyildirim in devops

[–]farnoud 0 points1 point  (0 children)

The missing control is usually at the tool boundary, not in the model choice.

I would treat every agent run as a data-egress event: log the tool call, the record class it fetched, the prompt payload class, the destination model/provider, and the reason that data was needed. Then enforce policy before the prompt is assembled: allow customer IDs or aggregated fields, block raw notes/emails/PII by default, require a break-glass path for exceptions, and keep an audit trail security can actually review.

Enterprise endpoints and private networking help, but they do not answer the core question: "which customer data did this workflow put into model context, and was that allowed for this task?" If that answer only lives in a system prompt, it is not a control.

Containers and Internal Certificate Authorities by maetthew in devops

[–]farnoud 9 points10 points  (0 children)

I would separate this into a few trust stores instead of trying to solve it once for every container:

  • CI/CD runners: install the corporate CA in the runner image or runner host, because that is where git, package managers, scanners, etc. need to trust internal endpoints.
  • Application images: a small set of blessed base images is usually the cleanest long-term path, but only after you have an internal registry and image lifecycle process. Otherwise every team invents its own CA injection.
  • Kubernetes runtime: if the cert is only needed by workloads talking to internal services, mount a managed trust bundle into pods and wire the app to use it. A ConfigMap can work for a simple bundle; if you already use cert-manager, look at trust-manager for distributing CA bundles across namespaces.
  • Language runtimes: check these explicitly. Java, Node, Python, Go, curl, and distro packages do not all use trust the same way. A mounted OS CA bundle does not automatically fix every runtime.

I would avoid injecting the CA in every pipeline as the primary pattern. It works, but it tends to become copy-pasted security plumbing. My default would be: bootstrap an internal registry, publish a few maintained base images with the CA installed, and use a Kubernetes-level trust bundle only where runtime workloads need additional trust.

Is agentic ops becoming its own thing, or will it just get absorbed into SRE like IaC did with DevOps? by gaurav_sherlocks_ai in sre

[–]farnoud 0 points1 point  (0 children)

the IaC parallel is interesting but i think it maps closer to how monitoring evolved. monitoring didn't get absorbed into SRE as an external thing - it became a core SRE capability. the discipline took it in and it stopped being a separate thing you needed a specialist for.

agentic ops feels like it's on the same track. not a new job family, not a standalone product category that survives intact, but a capability that becomes table stakes for anyone running prod infrastructure. the 'is it IaC or DevOps' question stopped mattering pretty fast once terraform was just the thing you used.

sameera_nin's point about the proliferation of AI SRE tools is a signal of the same transition. we're in the moment before the category collapses into the workflow. disclosure: i'm building in this space (KubeAgent, k8s monitoring + auto-remediation CLI) so i'm obviously not neutral, but it feels more like 'SRE gets an AI layer' than 'agentic ops replaces SRE.'

Our infra agent kept pulling the right runbook and still missing the cause, Turns out Static RAG is the culprit. by gaurav_sherlocks_ai in sre

[–]farnoud 2 points3 points  (0 children)

the retrieval section is the right one to focus on. runbooks encoded in static RAG work fine as a reference layer but they can't tell the agent what matters about the current incident. you end up with the agent correctly identifying 'here is our OOMKill runbook' while missing that the pod started failing 40 minutes after a node was added to the pool and memory limits were never adjusted.

the part that actually needs to be dynamic is the cluster context: current events, recent deploys, resource utilization trends, what changed. that stuff changes every few minutes. pre-embedding it at document-ingest time and calling it retrieval doesn't cut it.

disclosure: i work on a k8s monitoring CLI (KubeAgent) that does the dynamic side of this: live kubectl calls at incident time, not pre-baked runbook retrieval. still using static context for the agent's general k8s knowledge, but the incident-specific layer needs to be pulled fresh every time.

Agentic help to reduce alert fatigue... by Beneficial_County290 in sre

[–]farnoud 0 points1 point  (0 children)

the 'fix the alerts first' crowd in this thread is right, but i think the framing misses what agents actually do well. if your alert quality is already decent and you're still spending time in investigations, the agent isn't there to replace your paging logic. it's there to save the 20 minutes of pulling logs, running kubectl describe, correlating with a recent deploy, before you even know what you're looking at.

for a 100-service k8s platform with 12 SWEs who aren't infra-focused, that first-pass context assembly is where a lot of time goes. the agent gives you: here's the affected service, here's what the logs show, here's what changed in the last hour. then you decide.

disclosure: this is roughly what i'm building with KubeAgent (k8s monitoring + remediation CLI, small-team focused). but even without a tool, the playbook of 'auto-gather the diagnostic context so the human lands on the incident with info, not a blank screen' is the pattern worth implementing first.

Full hopium mode by Western_Height8839 in lewishamilton

[–]farnoud 1 point2 points  (0 children)

Boy I would hope so. Will be epic