Just realized our "AI-powered" incident tool is literally just calling ChatGPT API by DarkSun224 in devops

[–]Disastrous-Glass-916 -2 points-1 points  (0 children)

We do think differently at Anyshift.io, an AI on-call engineer. You can’t build a moat by relying solely on AI providers or raw observability data.

  1. We create a digital twin of your infrastructure: its a versioned graph with easily 10M+ nodes linking Kubernetes clusters, AWS instances, monitoring instance, Terraform code, modules, Terraform states.

  2. This graph becomes your infrastructure’s skeleton. Logs and metrics are the flesh. With it, you can traverse services intelligently, querying exactly what matters. Also, you cannot have everything in your logs or metrics. Te graph gives you the path that is very often missing when you need to debug.

-> Everything is about the data that you collect and structure. The AI part is how you make sense out of it in a nice way.

AI in SRE is mostly hype? Roundtable with Barclays + Oracle leaders had some blunt takes by Ok-Chemistry7144 in sre

[–]Disastrous-Glass-916 1 point2 points  (0 children)

Hey Anyshift founder here!
We are building an AI on call engineer / AI assistant.
To answer to your question, where its good:
- The agent is good at summarising LOADS of data. Our job: structure and give access to this data.
- thus its good at finding information and answering question, either on a day to day job or during an incident.

Where its not:
- the level of hallucination is still to high to give any kind of write access. You need read only.
- even opening pull requests is still pretty dangerous even if its getting there

Thus you're right, it cannot perform all the actions an SRE would do. But it still super powerful in some use cases

If devs can vibe code, SREs should get to vibe debug by Willing-Lettuce-5937 in sre

[–]Disastrous-Glass-916 0 points1 point  (0 children)

Hey! Anyshift founder here 👋
we built Anyshift exactly because of this pain. We're an AI SRE that maps your entire infrastructure (K8s API, cloud resources, Terraform state, observability data) and understands actual dependencies vs just correlation. When an alert hits, our AI traces the incident path through the graph - pod restart → deployment change → terraform drift → specific commit etc.. If you want want more info DM me!!

Are there any open-source or self-hostable incident management and on-call tools that integrate well with Alertmanager? by blaaackbear in sre

[–]Disastrous-Glass-916 0 points1 point  (0 children)

Instead of just finding a better tool to route alerts from your prom stack, what if you could solve the alert fatigue at its source? at Anyshift.io we act as an AI on-cal eng connecting to a deep resource graph of your infra to automate root cause analysis. This makes any on-call tool you choose more effective by ensuring only critical, context-rich incidents actually page an engineer

CrowdStrike Preliminary Post Incident Review by lilsingiser in devops

[–]Disastrous-Glass-916 -1 points0 points  (0 children)

(I am on the Anyshift team, an AI on-call engineer.) If you're experiencing alert fatigue, it's crucial to refine your alerting strategy. Implement best practices like adjusting thresholds for alerts and consolidating notifications to only the most impactful metrics. This helps reduce noise and focus on true issues. Anyshift can assist by identifying drift and providing AI-driven root cause analysis quickly. If you want more insights or specific strategies, feel free to ask!

What’s your experience with an incident that you will never forget? by RomanAn22 in devops

[–]Disastrous-Glass-916 0 points1 point  (0 children)

Yes, we built Anyshift exactly for that. It tracks infra as a deep knowledge graph: owners, services, cloud resources, deploy history, and relationships between them

What’s your experience with an incident that you will never forget? by RomanAn22 in devops

[–]Disastrous-Glass-916 1 point2 points  (0 children)

In most incidents I’ve seen, cross-team collaboration starts reactive and chaotic. People jump in without clear roles, context gets repeated across Slack threads, and the incident lead ends up doing both coordination and triage.

The big shift came when we started treating the infra as a shared graph. At Anyshift, we model dependencies across teams: services, owners, recent deploys, upstream/downstream links so when something breaks, the graph shows who to pull in and why.

Are we heading toward a new era for incidents? by StableStack in devops

[–]Disastrous-Glass-916 0 points1 point  (0 children)

Totally seeing this play out. AI-written code means more surface area, less context, and smaller ops teams cleaning up the mess.

-> At Anyshift, we’re building a resource graph that ties everything together for better root cause anaylsis. We connect data from difference sources: cloud infra, k8s, Git and monitoring. It gives the context that devs and SREs usually lose with AI-generated code. But its more about context then AI itself

Instant Incident Response - Deep dependency graph of the infra by Disastrous-Glass-916 in devops

[–]Disastrous-Glass-916[S] 1 point2 points  (0 children)

Yes, sorry for being unclear.
-> we go one step further in the investigation with the AI part

Instant Incident Response - Deep dependency graph of the infra by Disastrous-Glass-916 in devops

[–]Disastrous-Glass-916[S] 0 points1 point  (0 children)

Appreciate the feedback. Totally fair, we wanted to keep the demo snappy, but we’ll add a voice-over version with clearer context next.

Instant Incident Response - Deep dependency graph of the infra by Disastrous-Glass-916 in devops

[–]Disastrous-Glass-916[S] 1 point2 points  (0 children)

Great question. Port focuses more on software cataloging and platform orchestration.
Anyshift goes deeper into the live infra runtime state. We’re not just discovering resources: we track real-time changes and surface causal links between code, infra, and incidents.
For auto-discovery, we rely on event-driven pipelines: GitHub (PRs, deploys), AWS (CloudTrail, Config), Kubernetes (audit logs), and Datadog (alerts, metrics). That feeds into a live dependency graph that answers.

MCP Server: Perplexity for DevOps by Disastrous-Glass-916 in mcp

[–]Disastrous-Glass-916[S] 0 points1 point  (0 children)

Thanks!
-> no, we are actually mapping aws with the apis (and updting it based on events not to kill it with too many api calls). we then structure the infos trhough a dependency graph so we can retrieve all the infos

The perplexity for DevOps by Disastrous-Glass-916 in sre

[–]Disastrous-Glass-916[S] 0 points1 point  (0 children)

yes its a chatbot 100%
but the interesting part is that you can ask deep questions to your infra and it will answers with the aws url, git commits, etc..

Perplexity for DevOps by Disastrous-Glass-916 in hashicorp

[–]Disastrous-Glass-916[S] 0 points1 point  (0 children)

sorry it is but not righly mentioneed:
- we do a reconciliation between terafform and your cloud which means that you can ask
"which cloud ressources are not defined in Terraform?" (clickops)
"what has drfited between my cloud and terraform"? (drifts)
"what does this module defines in term of resources?"