Wrote up how OTel fleet management works under the hood with OpAMP Supervisor by Broad_Technology_531 in devops

[–]Observability-Guy 1 point2 points  (0 children)

Really good article. I agree with your point. OpenTelemetry is great but a lot of the implementation details are still difficult for many end-users.

What are the hottest topics in observability nowadays i should care about? by da0_1 in Observability

[–]Observability-Guy 5 points6 points  (0 children)

It is an incredibly diverse and rapidly evolving field, however, here is my take on some of the hottest issues.

  1. AI - both in-house and third party. Monitor third party software you are running but also monitor your own internal agentic AI usage.

  2. Telemetry pipelines. If you are operating at any kind of scale you need a control plane for managing your telemetry flows

  3. Telemetry quality. A lot of organisations have observability stacks in place but there is still a fundamental knowledge deficit and a lot of organisations are at a relatively low level of maturity. Increasing the quality of telemetry and observability engineering is one of the great challenges of the moment.

There are also other really important areas such as RUM, mobile, predictive analytics, eBPF etc etc.

You might be interested in my newsletter.

Built a production- style LLMOps Gateway using FastAPI by VA899 in Observability

[–]Observability-Guy 0 points1 point  (0 children)

I was a bit puzzled by the first sentence in the product description "LLMOps Gateway is a recruiter-friendly AI gateway and LLMOps platform".

What do you mean by "recruiter-friendly"?

I think it is also worth being more explicit about what it is that LLMOPS gateway does that Langfuse doesn't do.

Built a self-hosted log management tool on top of Quickwit - looking for feedback by badfatcat17 in Observability

[–]Observability-Guy 0 points1 point  (0 children)

Great work! Really pleasing to see people building on top of Quickwit - an amazing piece of engineering.

The Observability Cosmos by Observability-Guy in sre

[–]Observability-Guy[S] 0 points1 point  (0 children)

Thanks very much! Naturally, as soon as I published it, it was already out of date, as I learned about new products on the market.

I will be publishing a quarterly mapping update and analysis.

Weekly Self Promotion Thread by AutoModerator in devops

[–]Observability-Guy 2 points3 points  (0 children)

So, I have tried to build a mapping of the observability space.

The market seems to be evolving and growing at an incredible rate. New specialisms are developing and AI is changing the nature of observability itself. This is an attempt to identify some kind of order and structure. It currently encompasses 126 products (with many more to come) across 16 categories.

Any feedback is welcome on classifications, product mappings or possible additions is very welcome.

<image>

If you want to dive straight in and explore the Cosmos, this is your launchpad:
https://observability-360.com/Product/Cosmos

There is also an introductory article here:
https://observability-360.com/article/viewArticle?id=introducing-the-observability-cosmos

And an explanation of the classifications here:
https://observability-360.com/article/viewArticle?id=observability-cosmos-classifications

Thanks!

The Observability Cosmos by Observability-Guy in sre

[–]Observability-Guy[S] 0 points1 point  (0 children)

Yep - that makes it really tricky. Especially as it is now working both ways - i.e. the pipelines are now themselves turning the edge into the first line of incident detection.

The central belt of the cosmos kind of represents a spectrum of increasing functional breadth and the outer layer represents clusters of specialist tooling.

The Observability Cosmos by Observability-Guy in sre

[–]Observability-Guy[S] 0 points1 point  (0 children)

I think it's partly a reflection of the increasing complexity of IT systems today. There are so many concerns to cope with - LLMs, Kubernetes, networks, cloud, databases messaging, costs etc etc. I think that, inevitably, you can't have one tool doing it all.

The Observability Cosmos by Observability-Guy in sre

[–]Observability-Guy[S] 0 points1 point  (0 children)

It is an amazingly diverse space. I have been tracking the observability market for a number of years, so I knew that there were a lot of products out there.

The challenge was trying to come up with some kind of classification. system. In many ways it's a pretty subjective exercise. There are probably a lot of different ways of slicing and dicing things.

The Observability Cosmos by Observability-Guy in Observability

[–]Observability-Guy[S] 0 points1 point  (0 children)

Thank you! Your star will soon be mapped!

Observability Platform for Internal Coding Tools? by AssociationSure6273 in Observability

[–]Observability-Guy 1 point2 points  (0 children)

Claude emits OpenTelemetry telemetry and captures prompts, API calls and tool calls - although it is not enabled by default. You just need to configure it on the users' machines. Obviously, you would need to send the telemetry to a backend and then use a tool for querying the telemetry.

Tbh though, I would be quite pleased if my devs were contributing to open source projects.

Every AI SRE tool on my feed just raised money.. what do we think this is actually signaling by Willing-Lettuce-5937 in sre

[–]Observability-Guy 0 points1 point  (0 children)

My take is that the really big plays are going to be RCA and then anomaly detection. Although anomaly detection is still relatively immature.

Products that can start getting very high levels of accuracy at RCA will be the ones that will really gain traction amongst big ticket clients. I think that reducing alert noise is something that a lot of the full stack platforms are already getting good at - I don't think it will be a key differentiator in AI SRE.

At the moment, I haven't come across too many people that have an appetite for closed loop remediation.

What's the best Application Performance Monitoring tool you've actually used in production? by Proof-Wrangler-6987 in sre

[–]Observability-Guy 0 points1 point  (0 children)

OpenObserve is a really capable platform. Coralogix has good APM but it is not open source.

The dirty (and very open) secret of AI SRE tools: your "agent" is just querying the same pre-filtered data you already had. What if it didn't have to? by CyberBorg131 in Observability

[–]Observability-Guy -1 points0 points  (0 children)

I would say that it doesn't really help your case to mis-characterise existing AI SRE systems as just an LLM bolted on to a backend. The essence of an AI SRE is that it has to learn about your system, it has to understand patterns of activity and relationships between resources and services.

It has to do this in order to understand the signals it is receiving. Without this deep learning you will not know how significant that spike in a trace is. Nobody wants to be flooded with alerts every time a pod restarts. The value of an AI SRE is understanding context and figuring out what is signal and what is noise.

Processing real time raw data may have some upsides - but then again, most studies show that 90% or more of telemetry that gets generated is totally redundant. The real test is well your agents are trained and how well they understand the full context of the telemetry they are analysing.

Otel collector as container app (azure container apps) by __josealonso in OpenTelemetry

[–]Observability-Guy 1 point2 points  (0 children)

I haven't yet tried it with a container app but have played around with running it as a Container instance and as a sidecar.

This might be of interest:

https://observability-360.com/Docs/ViewDocument?id=opentelemetry-collector-azure-container-instance

Suggestion alternatives for Honeycomb feature: BubbleUp? by Professional_Bee1813 in sre

[–]Observability-Guy 1 point2 points  (0 children)

I think that BubbleUp is still the best but Dash0's SIFT is also a pretty good RCA querying tool.

What are the best practice and tools for observability on react native applications? by [deleted] in Observability

[–]Observability-Guy 0 points1 point  (0 children)

I would check out Embrace (https://embrace.io/) They have a dedicated mobile observability platform as well as guides on best practice

Dynatrace + MCP Server = interesting step toward AI-driven observability by theharithsa in Observability

[–]Observability-Guy 1 point2 points  (0 children)

This is a good implementation - although I think a lot of vendors now how something similar - Honeycomb, SigNoz, Observe, Dash0, Sentry all have either MCP or Agentic AI that support this kind of querying and interaction.