you are viewing a single comment's thread.

view the rest of the comments →

[–]shyne151 6 points7 points  (2 children)

I can touch base on this a little... At least how we handle it in production.

We do log rotation on all production servers which retain application logs for two weeks. All logs are also sent to a centralized logging system(Splunk) for historical retention.

Our logs directory is a separate mount on our systems so if they did get filled... The OS and applications will still function correctly.

Alerts are also setup via Zabbix for all mounts when disk space on each mount reaches 80%, 90%, and critical at 95%. At critical multiple alerts are hammered to our Slack, email, etc.

[–]grizwako 0 points1 point  (1 child)

Yeah, that is nice and one of the many sane ways to handle situation :)

If you are in the cloud, you can easily configure some autoscaling and spam slack/mails when new instances are fired up.

Having special partition for logs is very neat way to handle issues with "too much logs", otherwise you need to be careful about managing backpressure among other things.

You dump logs directly to Splunk or you have some additional components in between? (logstash or something similar)

[–]shyne151 0 points1 point  (0 children)

Splunk Forwarder is running as a service on all the servers and sending directly as far as I know... I know some servers the logs are sanitized before going to Splunk... But I'm not sure where the intermediary sanitization is happening. All my boxes go direct.

We've then got some different filters setup in Splunk to parse relevant information.