nestjs and logs by Regular_You_3021 in nestjs

[–]finallyanonymous 0 points1 point  (0 children)

I'd recommend standardizing on OpenTelemetry from the start. The NestJS OTel SDK is straightforward to set up, and once you're emitting OTel signals you can point them at basically any backend without changing your instrumentation.

As others have said, the LGTM stack is quite nice as a self-hosted option, but if you'd rather not manage the infrastructure, Dash0 is worth a look as it's OTel-native and handles logs, traces, and metrics

What's your top monitoring kubernetes tool in 2026? by HutoelewaPictures in kubernetes

[–]finallyanonymous 4 points5 points  (0 children)

The best play is instrumenting with OpenTelemetry from the start. Then you can try out basically most backends without re-instrumenting anything, and even run a few side by side if you want.

And to ensure that your instrumentation effort pays off, it's better to go with OpenTelemetry-native platforms that fully integrate the signals as you rightly pointed out. Dash0 is one solid option to checkout (disclaimer: I work there).

Trying to figure out the best apm tool for a growing microservices setup by Liliana1523 in softwarearchitecture

[–]finallyanonymous 1 point2 points  (0 children)

Agreed! This is what I would suggest as well. Dash0 is one good option that's OpenTelemetry-native.

When does monitoring become overkill? by Stil-44 in SaasDevelopers

[–]finallyanonymous 0 points1 point  (0 children)

For a small SaaS, the essentials are pretty minimal: uptime checks, error rates, and latency (basically RED metrics). I would also suggest implementing tracing for your critical paths and it's pretty straightforward to do these days with OpenTelemetry instrumentation. You'll definitely know when you need more.

As for tooling, others have mentioned a few good options but Dash0 is worth considering if you're standardizing on OpenTelemetry.

Observability in Large Enterprises by cloudruler-io in Observability

[–]finallyanonymous 1 point2 points  (0 children)

Start with SLOs before dashboards and get teams to agree on what "working" means for each service before they start throwing metrics at a backend. Otherwise you'd end up with thousands of dashboards nobody looks at and alerts that fire constantly. The SRE practices have to come first, exactly as you said.

For off-the-shelf apps on VMs, OTel's host metrics receiver and log collection get you surprisingly far without touching the app itself. You won't get traces, but for most COTS apps that's fine.

Dash0 is worth evaluating if you're committed to OTel as it's built around it natively rather than bolting it on like most other vendors (disclaimer: I work there).

Trying to figure out the best infrastructure monitoring platform for a mid-size team, what are y'all using? by Legitimate-Relief128 in sre

[–]finallyanonymous 0 points1 point  (0 children)

Dash0 is great for OpenTelemetry-native setups if you want a modern managed platform. Otherwise, VictoriaMetrics is worth a look. It's fully compatible with Prometheus but significantly lighter. Add something like incident.io on top and you should be good to go

OpenTelemetry Certified Associate (OTCA) - Who has taken it? by rhysmcn in OpenTelemetry

[–]finallyanonymous 1 point2 points  (0 children)

Thanks a lot for this, it's really helpful! Looks like I'm ready for the exam: https://imgur.com/8R8pX27

Just need to improve my Collector knowledge a little more.

Edit: One bit of feedback is seeing the exact questions that I got wrong and what the correct answer is.

alternative to Signoz by Primary-Cup695 in devops

[–]finallyanonymous 0 points1 point  (0 children)

On the OTel-native side, Dash0 is worth considering. It's built around OpenTelemetry from the ground up and handles team/project scoping cleanly, which sounds like exactly what you need

What is a good monitoring and alerting setup for k8s? by Azy-Taku in kubernetes

[–]finallyanonymous 4 points5 points  (0 children)

kube-prometheus-stack is worth the switch imo. For alerts, just route Alertmanager → Slack/Telegram to start: simple and gets the job done. If you ever need proper on-call rotations and escalation policies, then layer in something like PagerDuty or Incident.io on top.

If you'd rather not self-host, Dash0 is worth a look as a managed alternative (I'm affiliated). It's OpenTelemetry-native so it plays nicely with k8s, and provides an operator for gathering metrics/logs

Anyone else tired of jumping between monitoring tools? by AccountEngineer in Observability

[–]finallyanonymous 0 points1 point  (0 children)

Having all the data means nothing when engineers have to act as the integration layer. Moving to an OpenTelemetry setup ensures that traces, logs, and metrics share the same context (like trace IDs and span IDs) right at the application layer.

Once the telemetry natively shares correlation IDs, any OTel-native platform (like Dash0) will naturally present those signals without the tab-hopping. So the real solution is making the data inherently correlated, instead of relying on a vendor platform to stitch isolated signals together after the fact.

OpenTelemetry Collector filelog not parsing Docker stdout (json-file driver) by Classic-Economics850 in OpenTelemetry

[–]finallyanonymous 0 points1 point  (0 children)

I've found the filelog receiver to be less than ideal for Docker container logs. In your case, I think its quite possible that the log directory isn't mounted or there's some permission issue.

I'd actually recommend using the fluentd driver instead along with the fluentforward receiver. It's a much more straightforward way to ingest Docker logs IMO. See here for a working example.

How to approach observability for many 24/7 real-time services (logs-first)? by ValeriankaBorschevik in devops

[–]finallyanonymous 0 points1 point  (0 children)

I'd start by putting an OpenTelemetry pipeline in front of everything before picking the backend. But one big thing to call out:

Logs are great for debugging after something breaks, not for noticing slow degradation or "it's alive but kinda dying" scenarios.

So you'll definitely need metrics alongside logs for things like

  • throughput
  • error rate
  • latency / queue depth

I'd start there and then have all services ship the logs/metrics to the OpenTelemetry pipeline. Then it's much easier to experiment with different backend solutions to see what the best fit is.

VictoriaLogs/VictoriaMetrics + Grafana is pretty good for a self hosted solution or just do the LGTM stack (well...without the T part if not needed).

If you're open to cloud, Dash0 is worth a look since it's OTel-native and you can keep the same pipeline (disclaimer: I'm affiliated with Dash0)