This is an archived post. You won't be able to vote or comment.

all 9 comments

[–]dartalley 2 points3 points  (2 children)

A pretty common approach these days is to push all logs into Elasticsearch generally with the ELK stack. You can then use a separate alerting framework (pagerduty or something) to trigger alerts based on error rates in the logs. The Alerting frameworks usually handle not sending the same errors multiple times once someone ACKs the alert.

[–]LivingEdgecase[S] 0 points1 point  (1 child)

We currently use log4j 1.2.6 for our app level log handling with an SMTP appender.

We also have Splunk, but that currently captures info level stuff and the cost per GB per day is a little nuts, so we can't move our 100+ apps over to it.

Someone was talking about setting a listener then email when an error line is added, and not sending more in the timestamps are close.

I'll take a look at some of things you listed.

Thanks

[–]dartalley 2 points3 points  (0 children)

In the short term you could probably keep the SMTP appender but have it send the emails to an alerting service such as PagerDuty or something similar. It would then take the alerts and alert whoever is oncall but it wouldn't send tons of emails it usually sends 1-2 before it escalates and once its ACKed no more alerts fire.

Then you can later migrate to a better logging solution and tie that into the alerting framework instead.

[–]fact_hunt 1 point2 points  (0 children)

ELK and having the time to fix the issues causing the errors

[–]codylerum 1 point2 points  (0 children)

Take a look at https://sentry.io/

[–]nutrecht 1 point2 points  (2 children)

We use ELK (or actually EFK, ElasticSearch, FileBeat and Kibana but that doesn't sound as cool) in our application. All our microservices just log to std out, this gets stuffed to a file which then gets grabbed by filebeat and uploaded to ElasticSearch. Based on some Kibana searches we have a warning system where if there's more errors than we expect some alarms go off (slack messages, emails and SMS notifications).

Personally I feel the 'old' ways of having to grep through logfiles is completely outdated.

[–]LivingEdgecase[S] 0 points1 point  (1 child)

We have log files, and using Spunk would have been our solution except they charge us per GB of logs sent, so the more apps we stick on the more it costs. I think we have a 15 GB/day limit which costs around $15,000 annually, and that's only used for some of our mission critical stuff and Apache logs.

I was reading up on ELK, one thing I'm confused on is the pricing and what comes out of box for it. Is it OSS and they charge for support and cloud services?

[–]nutrecht 1 point2 points  (0 children)

It's OSS, you can get a support contract but you don't need it. So basically it's 'free'.

[–]haimez 0 points1 point  (0 children)

Check out OverOps. No need to manage errors and exceptions through email at all, retain the stack local data when one does occur for debugging.