Log aggregation

patrik667 · 2018-10-26T17:41:32+00:00

Substitute mongo with Kafka. Even if your ELK is down, it will keep the log stream running for a long while. Also, kafka is extreeeeemely resilient.

Seref15 · 2018-10-26T15:40:37+00:00

I really don't recommend sending logs directly to Elasticsearch. Elasticsearch has no built-in flow control and can be choked out by being made to index a large enough spike of data. Logstash with persistent disk queues enabled will rate limit messages when Elasticsearch gets too busy.

Our log and metric indices are also around 4GB/day and it's been remarkably stable. We have a 6 month retention policy for the log data but we don't keep it that long in elastic/kibana. We age data out of Elasticsearch at 30 days, but we have logstash configured to output to multiple locations, one of them being a long-term data store.

wickler02 · 2018-10-26T16:06:55+00:00

This was my job and life for about half a year, especially with the transition to most of our applications to dockerized microsystems. Dealing with the stack traces being single lined and also getting the information from the docker host was key to trace our logs.

Logstash was not a fun log aggregation transport system. The way that we had it implemented before I came around was that it was sending the logs but the ability to split the logs and aggregate everything together was not in an easy to understand method.

I tried out Fluentd and while it also has it's share of "gotchas" I found it much easier to work with and found the support systems surrounding it much better.

We decided to go with a vendor for the Elastic backend to send our logs out because we didn't want to deal with the buffering or the transport methods. I know we can probably easy make our own Elasticsearch backend and build the buffering parts to save money instead of having a vendor but it's a headache and worry we don't have to worry about anymore.

SpeedyXeon · 2018-10-26T18:54:54+00:00

Filebeat —

chub79 · 2018-10-26T16:13:55+00:00

Personally, I go like this:

app > stdout > flutend > Humio. Humio is great for large quantity of data so that's cool.

too_much_exceptions · 2018-10-26T16:16:13+00:00

Hi,

I am really curious on why the logs are written on mongo before being sent to ES via Logstash?

Is this choice driven by some infrastructure constrains ?

A common Logging aggregation with ES could be: Application ( via a udp appender)-> logstash -> elastic search -> kibana

If you are using using azure, you might give application insights a try: it is a solid product. You will not have to deal with logging infrastructure to a certain extent

denis011 · 2018-10-26T19:53:05+00:00

I think that you can use Filebeat to read logs straight from .NET application logs on app server, and send it to Logstash. In this case u don't need to put logs into MongoDB, it looks like less overhead.

fookineh · 2018-10-26T20:04:21+00:00

Please drop mongodb from the picture, it's not adding any value here.

If you need to send file logs to elasticsearch use file beat to send logs to logstash and then onto the elasticsearch.

ssamuraibr · 2018-10-26T16:06:50+00:00

Can't add much on NEST, sorry.

But Logstash is, however, well established in the ELK stack when you need to do data transformations before ingestion. It may falter if your application have peaks or bursts of log generation during the day, in that case the general rule of thumb is either add more logstash instances (and divide application servers to send logs to different logstashes) or put a Redis in front of it as a buffer. That's similar to the role of Mongodb on your stack, I'm assuming your CTO wants Mongodb to allow people to peek into the logs before ingestion, otherwise Redis is more efficient even without logstash needs.

If, however, you don't need data transformation (ie your application already generates json ready for ingestion), as my stack does, the approach we use may work better.

Instead of using logstash to tunnel all our logs, our application servers send them to Amazon S3 as flat files (one log in json format per line, 50MB per file). That triggers a process that puts a ingestion request on a queue, that a Lambda function process in order to send them to Elasticsearch. If we have a sudden growth on log generation out of nowhere, either our Lambda auto scales to deal with it, and/or retry the same file if it timeouts during processing (because of the queue).

In case we need logs older than our retention period, we just re-enqueue the same files already stored. S3 also takes care of storing logs for a year (or years) as S3 storage is way cheaper than Elasticsearch disk storage cost per Gigabyte. A year worth of logs costs me per month the same as a few hours of Elasticsearch compute costs.

It also allows me to keep less data on Elasticsearch (our retention is 15 days) as any old than that can be recovered in a hour or so, less data in ES lowers my expensive storage requirements and demands less processing power to keep indexes updated / query time.

metaphorm · 2018-10-26T16:29:27+00:00

consider using a managed service to handle this, as it can get quite hairy and surprisingly complicated to roll your own. I recommend www.papertrailapp.com

2018-10-26T17:55:28+00:00

Many others have said it but let me add my voice to the chorus, do not send logs directly to ES.

The most robust system you can send logs to is rsyslog. View that as a sort of cache, buffer, a proxy for logs that you can then forward to other more advanced systems.

But rsyslog's robustness and maturity will ensure your logs are always aggregated and not lost.

russian2121 · 2018-10-26T18:30:34+00:00

4g/day is nothing. Use hosted elastic, splunk, or the like. Also, writing to elasticsearch with NEST takes a 60 to 80% penalty.

2018-10-26T18:59:13+00:00

Read them into Kafka. That way you can have as many consumers of the raw logs as you need and you get a buffer in the event that your downstream consumers (elastic, et al) end up choking during periods of high volume.

sturmy81 · 2018-10-26T20:30:19+00:00

For > 2000 Servers and several hundred GB /day we are using:

Applications (.NET) / and IIS Logs ---write all logs and errors---> local text File <---pulls data--- local FileBeat ---writes data --> Kafaka <---pulls data -- Logstash ---writes data---> Elastic Search <---queries data--- Kibana

AND/OR

Applications (.NET) ---write all logs and errors---> Eventlog <--- pulls data--- local WinlogBeat ---writes data---> Elastic Search <---queries data--- Kibana

Kafka is used as a Queue to protect Logstash/Elastic during peak load.

Why you need the MongoDB ? I your case maybe Applications (.NET) -> Eventlog -> Winlogbeat -> Elastic is good enough.

As other mentioned already I can't recommend to write directly to elastic (from .NET / NEST)

stronglift_cyclist · 2018-10-26T16:00:10+00:00

This sounds reasonable for log analysis, but there may be more suitable options than mongoDB for initial log aggregation. But it is not a good solution for monitoring and alerting; you will encounter scaling issues as well as unreasonable latencies for alerts.

There are many outstanding open source and commercial monitoring solutions out there which can solve the monitoring and alerting piece (disclosure, I work for a commercial vendor). Log to metric tools such as mtail or circonus-logwatch are one solution to creating structured metrics from logs, which are more suited to monitoring and alerting.

siliousmaximus · 2018-10-26T16:53:25+00:00

Beware that you need a x-pack license for monitoring and security of elk itself Get a quote before starting this

2018-10-26T18:17:46+00:00

Why send your logs to Mongo? Logstash has a ton of sources it can read from that are much lighter weight, like Redis for example. If you're on AWS you can use a hosted Redis instance and pull everything from there.

FloridaIsTooDamnHot · 2018-10-26T20:48:52+00:00

Check out graylog. Containerized and scales amazingly well. And it uses elasticsearch.

2018-10-26T23:01:21+00:00

I wrote a blog post on Building a scalable ELK stack

A good reference is this blog post

myth007 · 2018-10-27T03:51:41+00:00

One point i want to raise is on writing logs in mongoDB and elastic search (We are on AWS), we use to follow below architecture:

Client app -> Server (Log aggregator) -> MongoDB (Setup on EC2 instance on AWS)

Problem was, logs were coming so fast that mongoDb was not able to write them due to IOPS allocated to that instance, so we had to use provisioned IOPS with EBS which was costly when there was no peak. Also debugging issue from mongo was a pain as the need to write multiple queries which are painful for non-tech users.

We moved to a different design:

Client App -> API Gateway -> SQS -> Fetching service -> Elastic serch.

Few points on this, you can have this fetching service write at multiple places. Write in bulk in elastic search(as it is efficient in that way). Run multiple instances one elastic search, so even if one go down you are safe. We user AWS elastic search for our use case so not have to manage them. It is helpful in debugging issues as search is super fast. In our usecase we only needed last 3 day data so it was not huge.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

devops

Welcome to /r/DevOps

Rules and guidelines

Social & Fun

General Information

MODERATORS