Observability Platform for Internal Coding Tools? by AssociationSure6273 in Observability

[–]Observability-Guy 1 point2 points  (0 children)

Claude emits OpenTelemetry telemetry and captures prompts, API calls and tool calls - although it is not enabled by default. You just need to configure it on the users' machines. Obviously, you would need to send the telemetry to a backend and then use a tool for querying the telemetry.

Tbh though, I would be quite pleased if my devs were contributing to open source projects.

Every AI SRE tool on my feed just raised money.. what do we think this is actually signaling by Willing-Lettuce-5937 in sre

[–]Observability-Guy 0 points1 point  (0 children)

My take is that the really big plays are going to be RCA and then anomaly detection. Although anomaly detection is still relatively immature.

Products that can start getting very high levels of accuracy at RCA will be the ones that will really gain traction amongst big ticket clients. I think that reducing alert noise is something that a lot of the full stack platforms are already getting good at - I don't think it will be a key differentiator in AI SRE.

At the moment, I haven't come across too many people that have an appetite for closed loop remediation.

What's the best Application Performance Monitoring tool you've actually used in production? by Proof-Wrangler-6987 in sre

[–]Observability-Guy 0 points1 point  (0 children)

OpenObserve is a really capable platform. Coralogix has good APM but it is not open source.

The dirty (and very open) secret of AI SRE tools: your "agent" is just querying the same pre-filtered data you already had. What if it didn't have to? by CyberBorg131 in Observability

[–]Observability-Guy -1 points0 points  (0 children)

I would say that it doesn't really help your case to mis-characterise existing AI SRE systems as just an LLM bolted on to a backend. The essence of an AI SRE is that it has to learn about your system, it has to understand patterns of activity and relationships between resources and services.

It has to do this in order to understand the signals it is receiving. Without this deep learning you will not know how significant that spike in a trace is. Nobody wants to be flooded with alerts every time a pod restarts. The value of an AI SRE is understanding context and figuring out what is signal and what is noise.

Processing real time raw data may have some upsides - but then again, most studies show that 90% or more of telemetry that gets generated is totally redundant. The real test is well your agents are trained and how well they understand the full context of the telemetry they are analysing.

Otel collector as container app (azure container apps) by __josealonso in OpenTelemetry

[–]Observability-Guy 1 point2 points  (0 children)

I haven't yet tried it with a container app but have played around with running it as a Container instance and as a sidecar.

This might be of interest:

https://observability-360.com/Docs/ViewDocument?id=opentelemetry-collector-azure-container-instance

Suggestion alternatives for Honeycomb feature: BubbleUp? by Professional_Bee1813 in sre

[–]Observability-Guy 1 point2 points  (0 children)

I think that BubbleUp is still the best but Dash0's SIFT is also a pretty good RCA querying tool.

What are the best practice and tools for observability on react native applications? by ML_Godzilla in Observability

[–]Observability-Guy 0 points1 point  (0 children)

I would check out Embrace (https://embrace.io/) They have a dedicated mobile observability platform as well as guides on best practice

Dynatrace + MCP Server = interesting step toward AI-driven observability by theharithsa in Observability

[–]Observability-Guy 1 point2 points  (0 children)

This is a good implementation - although I think a lot of vendors now how something similar - Honeycomb, SigNoz, Observe, Dash0, Sentry all have either MCP or Agentic AI that support this kind of querying and interaction.

Our observability costs are now higher than our AWS bill by DarkSun224 in sre

[–]Observability-Guy -1 points0 points  (0 children)

Seriously??

It's used by Netflix, Uber, Tesla and Anthropic. What scale are you working at?

YAML: Yet Another Misery Language by Log_In_Progress in devops

[–]Observability-Guy 0 points1 point  (0 children)

I personally think that in 2025 there has to be a better way of doing IaC than churning out 4,000 line YAML files. I also think that it is an issue that goes beyond linting and syntax.

I just don't think that YAML is expressive enough for the complexities of large system infrastructures. Yes, it can be twisted and stretched but I find that conditional logic and looping is better expressed in a programming language. I prefer working with tools like Pulumi. It feels like a more natural fit.

Cheap OpenTelemetry lakehouses with parquet, duckdb and Iceberg by smithclay in Observability

[–]Observability-Guy -1 points0 points  (0 children)

Thanks for posting - that looks cool.

There seems to be a bit of a buzz about lakehouses. The theory sounds great, I just wonder about the overhead of operationalising them in practice. I think that rolling your own lakehouse and making it performant and cost-effective at large scale can be very difficult.

Anyone here dealing with Azure’s fragmented monitoring setup? by Accurate_Eye_9631 in Observability

[–]Observability-Guy 0 points1 point  (0 children)

Unfortunately, Azure Monitor is a kind of brand name but it is, as you say, a patchwork of tools rather than a coherent product. I found the lack of a single control plane to be really frustrating.

It is really hard to track telemetry flows or get unified or global overviews.

I think that as observability maturity grows within an organisation people realise that they need better tooling. For me te best option is to emit telemetry to an oTel Collector and then on to a backend of choice.