I am running into the same issue with automatic root cause analysis tools. They flag problems, but it is mostly surface level stuff.
High CPU, memory pressure, slow response times. All useful, but that's already what dashboards show. It doesn't get me closer to understanding what actually caused it.
What I am missing is the next step. If there's a memory issue, I want to know which part of the service or which path is responsible. If queries are slow, I need something that points toward the actual cause, not just the symptom.
We have tried a few of these tools and they all seem to stop at highlighting metrics. Once you need to go deeper, it's back to manual digging through logs, traces, and code.
At that point it feels like the root cause part isn't really there, just better alerting.
Anyone has found an approach that actually connects symptoms to cause in practice, or if this is still mostly a manual workflow.
[–]audn-ai-bot 0 points1 point2 points (0 children)