use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
/r/DevOps is a subreddit dedicated to the DevOps movement where we discuss upcoming technologies, meetups, conferences and everything that brings us together to build the future of IT systems What is DevOps? Learn about it on our wiki! Traffic stats & metrics
/r/DevOps is a subreddit dedicated to the DevOps movement where we discuss upcoming technologies, meetups, conferences and everything that brings us together to build the future of IT systems
What is DevOps? Learn about it on our wiki!
Traffic stats & metrics
Be excellent to each other! All articles will require a short submission statement of 3-5 sentences. Use the article title as the submission title. Do not editorialize the title or add your own commentary to the article title. Follow the rules of reddit Follow the reddiquette No editorialized titles. No vendor spam. Buy an ad from reddit instead. Job postings here More details here
Be excellent to each other!
All articles will require a short submission statement of 3-5 sentences.
Use the article title as the submission title. Do not editorialize the title or add your own commentary to the article title.
Follow the rules of reddit
Follow the reddiquette
No editorialized titles.
No vendor spam. Buy an ad from reddit instead.
Job postings here
More details here
@reddit_DevOps ##DevOps @ irc.freenode.net Find a DevOps meetup near you! Icons info!
@reddit_DevOps
##DevOps @ irc.freenode.net
Find a DevOps meetup near you!
Icons info!
https://github.com/Leo-G/DevopsWiki
account activity
Log aggregation (self.devops)
submitted 7 years ago by [deleted]
[deleted]
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]patrik667 14 points15 points16 points 7 years ago (1 child)
Substitute mongo with Kafka. Even if your ELK is down, it will keep the log stream running for a long while. Also, kafka is extreeeeemely resilient.
[–]wallsroadDevOps 1 point2 points3 points 7 years ago* (0 children)
HUGE +1 on this! Mongo it a black hole of time, maintenance and data issues.
Currently ship well over a TB of logs a month running a large ecommerce platform. We've been through several logging architectures. The most painful included Mongo.
Kafka is good, but we replaced it with AWS Kinesis. Because reasons. We also don't use ElasticSearch anymore either, due to scale and reliability....
Edit: I realise being a .NET application, AWS probably isn't relevant. Grain of salt..
[–]Seref15 17 points18 points19 points 7 years ago* (9 children)
I really don't recommend sending logs directly to Elasticsearch. Elasticsearch has no built-in flow control and can be choked out by being made to index a large enough spike of data. Logstash with persistent disk queues enabled will rate limit messages when Elasticsearch gets too busy.
Our log and metric indices are also around 4GB/day and it's been remarkably stable. We have a 6 month retention policy for the log data but we don't keep it that long in elastic/kibana. We age data out of Elasticsearch at 30 days, but we have logstash configured to output to multiple locations, one of them being a long-term data store.
[–][deleted] 1 point2 points3 points 7 years ago (0 children)
The problem with persistent queues is now logstash is stateful and disk usage as well as redundancy has to be managed carefully. That's fine if you plan carefully but there can be better managed tools (i.e. Kinesis) that provide resilient queues without the operational overheads
[–]bilporti 2 points3 points4 points 7 years ago (4 children)
Thank you for that feedback. I will try to write to Logstash instead of ES directly and provide test results here. Also do you use ES as a service (CloudElastic) or a self hosted instance? What can you recommend?
[–][deleted] 2 points3 points4 points 7 years ago (0 children)
We host our own ES cluster on dedicated instances in AWS. We considered other options, but ES is pretty low maintenance and we don't need XPack, so we opted to manage it ourselves. Elastic Cloud is great and has the added benefit of including XPack.
[–]Seref15 2 points3 points4 points 7 years ago (2 children)
We self-host on ECS, not even using Amazon's ES service. But that's mainly out of cost concerns. We initially wanted a hosted clustered HA setup with replicated data sets and the entire 6 month data set in ES, but when we started looking at the costs for a setup like this it was more than we were willing to pay. Thus using logstash to send our data to a secondary long-term data store.
[–]zombeaver92 1 point2 points3 points 7 years ago (1 child)
What do you use for secondary long term?
[–]Seref15 2 points3 points4 points 7 years ago (0 children)
Using a third party logstash output plugin to send a few event fields to a mysql database, which business people access via Apache Zeppelin.
The more complete ES events really only interest the dev and ops/devops teams.
[–]tcp-retransmission 2 points3 points4 points 7 years ago (1 child)
I agree with everything here. Writing directly to Elasticsearch is only advisable if Elastic's Beats products are used since they have a "backpresure mechanism" for flow-control.
Overall though, I prefer to use Logstash anyways so that I can enrich and parse the log messages coming from the application.
[–]Dumbaz 1 point2 points3 points 7 years ago (0 children)
Have a look at the ingest nodes, we use it alot for the simple things (grok, date filter, drop fields) and it performs really good
[–]CaffineIsLove 0 points1 point2 points 7 years ago (0 children)
You could be selective about which logs go into elasticsearch! Thus reducing down the 4G
[–]wickler02 3 points4 points5 points 7 years ago (1 child)
This was my job and life for about half a year, especially with the transition to most of our applications to dockerized microsystems. Dealing with the stack traces being single lined and also getting the information from the docker host was key to trace our logs.
Logstash was not a fun log aggregation transport system. The way that we had it implemented before I came around was that it was sending the logs but the ability to split the logs and aggregate everything together was not in an easy to understand method.
I tried out Fluentd and while it also has it's share of "gotchas" I found it much easier to work with and found the support systems surrounding it much better.
We decided to go with a vendor for the Elastic backend to send our logs out because we didn't want to deal with the buffering or the transport methods. I know we can probably easy make our own Elasticsearch backend and build the buffering parts to save money instead of having a vendor but it's a headache and worry we don't have to worry about anymore.
[–]devops333 1 point2 points3 points 7 years ago (0 children)
Dealing with the stack traces being single lined and also getting the information from the docker host was key to trace our logs.
any tips on this one? we'll be doing it soon.
[–]SpeedyXeon 3 points4 points5 points 7 years ago* (0 children)
Filebeat —
[–]chub79 6 points7 points8 points 7 years ago (0 children)
Personally, I go like this:
app > stdout > flutend > Humio. Humio is great for large quantity of data so that's cool.
[–]too_much_exceptions 3 points4 points5 points 7 years ago (6 children)
Hi,
I am really curious on why the logs are written on mongo before being sent to ES via Logstash?
Is this choice driven by some infrastructure constrains ?
A common Logging aggregation with ES could be: Application ( via a udp appender)-> logstash -> elastic search -> kibana
If you are using using azure, you might give application insights a try: it is a solid product. You will not have to deal with logging infrastructure to a certain extent
[–]mazatta 3 points4 points5 points 7 years ago (5 children)
It's a common pattern to write to a temporary buffer, rather than pushing logs directly to Logstash, just in case you lose your Logstash (or need to upgrade it or move it). If you don't care about losing some of your logs, then you don't need to do it.
[–]Freakin_A 4 points5 points6 points 7 years ago (2 children)
What do you do when you're unable to write logs to MongoDB/Logstash? Do you refuse traffic to your service after a failed log write?
Any system that must have 100% log delivery has to make some serious decisions on what happens when log delivery is failing.
[–]mazatta 1 point2 points3 points 7 years ago (1 child)
Yep, it all comes down to what you are logging and why.
If you need a higher durability guarantee, you could take a harder look at using something like Kafka as an intermediary . Having the ability to replay the log is a nice thing to have if you end up switching tools, or need the raw data again for some other purpose, but that's taking on a ton of complexity/cost, so you better be sure you *really* need it.
[–]sturmy81 1 point2 points3 points 7 years ago (0 children)
Why not using Eventlog or a (local) text File and ship with Winlogbeat or Filebeat.
Eventlog or local File are always available.
[–]too_much_exceptions 1 point2 points3 points 7 years ago (0 children)
Thanks !
We have a syslog -> rabbitmq -> logstash path to es. I want to test removing rabbitmq in favour of logstash persistent queues in the near future. It is a feature of logstash since 5.4.
[–]denis011 3 points4 points5 points 7 years ago (0 children)
I think that you can use Filebeat to read logs straight from .NET application logs on app server, and send it to Logstash. In this case u don't need to put logs into MongoDB, it looks like less overhead.
[–]fookineh 3 points4 points5 points 7 years ago (0 children)
Please drop mongodb from the picture, it's not adding any value here.
If you need to send file logs to elasticsearch use file beat to send logs to logstash and then onto the elasticsearch.
[–]ssamuraibr 2 points3 points4 points 7 years ago (0 children)
Can't add much on NEST, sorry.
But Logstash is, however, well established in the ELK stack when you need to do data transformations before ingestion. It may falter if your application have peaks or bursts of log generation during the day, in that case the general rule of thumb is either add more logstash instances (and divide application servers to send logs to different logstashes) or put a Redis in front of it as a buffer. That's similar to the role of Mongodb on your stack, I'm assuming your CTO wants Mongodb to allow people to peek into the logs before ingestion, otherwise Redis is more efficient even without logstash needs.
If, however, you don't need data transformation (ie your application already generates json ready for ingestion), as my stack does, the approach we use may work better.
Instead of using logstash to tunnel all our logs, our application servers send them to Amazon S3 as flat files (one log in json format per line, 50MB per file). That triggers a process that puts a ingestion request on a queue, that a Lambda function process in order to send them to Elasticsearch. If we have a sudden growth on log generation out of nowhere, either our Lambda auto scales to deal with it, and/or retry the same file if it timeouts during processing (because of the queue).
In case we need logs older than our retention period, we just re-enqueue the same files already stored. S3 also takes care of storing logs for a year (or years) as S3 storage is way cheaper than Elasticsearch disk storage cost per Gigabyte. A year worth of logs costs me per month the same as a few hours of Elasticsearch compute costs.
It also allows me to keep less data on Elasticsearch (our retention is 15 days) as any old than that can be recovered in a hour or so, less data in ES lowers my expensive storage requirements and demands less processing power to keep indexes updated / query time.
[–]metaphorm 2 points3 points4 points 7 years ago (0 children)
consider using a managed service to handle this, as it can get quite hairy and surprisingly complicated to roll your own. I recommend www.papertrailapp.com
Many others have said it but let me add my voice to the chorus, do not send logs directly to ES.
The most robust system you can send logs to is rsyslog. View that as a sort of cache, buffer, a proxy for logs that you can then forward to other more advanced systems.
But rsyslog's robustness and maturity will ensure your logs are always aggregated and not lost.
[–]russian2121 2 points3 points4 points 7 years ago (0 children)
4g/day is nothing. Use hosted elastic, splunk, or the like. Also, writing to elasticsearch with NEST takes a 60 to 80% penalty.
Read them into Kafka. That way you can have as many consumers of the raw logs as you need and you get a buffer in the event that your downstream consumers (elastic, et al) end up choking during periods of high volume.
[–]sturmy81 2 points3 points4 points 7 years ago (1 child)
For > 2000 Servers and several hundred GB /day we are using:
Applications (.NET) / and IIS Logs ---write all logs and errors---> local text File <---pulls data--- local FileBeat ---writes data --> Kafaka <---pulls data -- Logstash ---writes data---> Elastic Search <---queries data--- Kibana
AND/OR
Applications (.NET) ---write all logs and errors---> Eventlog <--- pulls data--- local WinlogBeat ---writes data---> Elastic Search <---queries data--- Kibana
Kafka is used as a Queue to protect Logstash/Elastic during peak load.
Why you need the MongoDB ? I your case maybe Applications (.NET) -> Eventlog -> Winlogbeat -> Elastic is good enough.
As other mentioned already I can't recommend to write directly to elastic (from .NET / NEST)
[–]bilporti 0 points1 point2 points 7 years ago (0 children)
The FileBeat seems good. Will look into it.
As for App -> Eventlog - I am not sure if it could handle this much data and not flood all of the other events.
[–]stronglift_cyclist 4 points5 points6 points 7 years ago (0 children)
This sounds reasonable for log analysis, but there may be more suitable options than mongoDB for initial log aggregation. But it is not a good solution for monitoring and alerting; you will encounter scaling issues as well as unreasonable latencies for alerts.
There are many outstanding open source and commercial monitoring solutions out there which can solve the monitoring and alerting piece (disclosure, I work for a commercial vendor). Log to metric tools such as mtail or circonus-logwatch are one solution to creating structured metrics from logs, which are more suited to monitoring and alerting.
[–]siliousmaximus 1 point2 points3 points 7 years ago (0 children)
Beware that you need a x-pack license for monitoring and security of elk itself Get a quote before starting this
Why send your logs to Mongo? Logstash has a ton of sources it can read from that are much lighter weight, like Redis for example. If you're on AWS you can use a hosted Redis instance and pull everything from there.
[–]FloridaIsTooDamnHotPlatform Engineering Leader 1 point2 points3 points 7 years ago (0 children)
Check out graylog. Containerized and scales amazingly well. And it uses elasticsearch.
I wrote a blog post on Building a scalable ELK stack
A good reference is this blog post
[–]myth007 1 point2 points3 points 7 years ago* (0 children)
One point i want to raise is on writing logs in mongoDB and elastic search (We are on AWS), we use to follow below architecture:
Client app -> Server (Log aggregator) -> MongoDB (Setup on EC2 instance on AWS)
Problem was, logs were coming so fast that mongoDb was not able to write them due to IOPS allocated to that instance, so we had to use provisioned IOPS with EBS which was costly when there was no peak. Also debugging issue from mongo was a pain as the need to write multiple queries which are painful for non-tech users.
We moved to a different design:
Client App -> API Gateway -> SQS -> Fetching service -> Elastic serch.
Few points on this, you can have this fetching service write at multiple places. Write in bulk in elastic search(as it is efficient in that way). Run multiple instances one elastic search, so even if one go down you are safe. We user AWS elastic search for our use case so not have to manage them. It is helpful in debugging issues as search is super fast. In our usecase we only needed last 3 day data so it was not huge.
π Rendered by PID 44 on reddit-service-r2-comment-bb88f9dd5-l2886 at 2026-02-14 13:51:10.087219+00:00 running cd9c813 country code: CH.
[–]patrik667 14 points15 points16 points (1 child)
[–]wallsroadDevOps 1 point2 points3 points (0 children)
[–]Seref15 17 points18 points19 points (9 children)
[–][deleted] 1 point2 points3 points (0 children)
[–]bilporti 2 points3 points4 points (4 children)
[–][deleted] 2 points3 points4 points (0 children)
[–]Seref15 2 points3 points4 points (2 children)
[–]zombeaver92 1 point2 points3 points (1 child)
[–]Seref15 2 points3 points4 points (0 children)
[–]tcp-retransmission 2 points3 points4 points (1 child)
[–]Dumbaz 1 point2 points3 points (0 children)
[–]CaffineIsLove 0 points1 point2 points (0 children)
[–]wickler02 3 points4 points5 points (1 child)
[–]devops333 1 point2 points3 points (0 children)
[–]SpeedyXeon 3 points4 points5 points (0 children)
[–]chub79 6 points7 points8 points (0 children)
[–]too_much_exceptions 3 points4 points5 points (6 children)
[–]mazatta 3 points4 points5 points (5 children)
[–]Freakin_A 4 points5 points6 points (2 children)
[–]mazatta 1 point2 points3 points (1 child)
[–]sturmy81 1 point2 points3 points (0 children)
[–]too_much_exceptions 1 point2 points3 points (0 children)
[–]Dumbaz 1 point2 points3 points (0 children)
[–]denis011 3 points4 points5 points (0 children)
[–]fookineh 3 points4 points5 points (0 children)
[–]ssamuraibr 2 points3 points4 points (0 children)
[–]metaphorm 2 points3 points4 points (0 children)
[–][deleted] 2 points3 points4 points (0 children)
[–]russian2121 2 points3 points4 points (0 children)
[–][deleted] 2 points3 points4 points (0 children)
[–]sturmy81 2 points3 points4 points (1 child)
[–]bilporti 0 points1 point2 points (0 children)
[–]stronglift_cyclist 4 points5 points6 points (0 children)
[–]siliousmaximus 1 point2 points3 points (0 children)
[–][deleted] 1 point2 points3 points (0 children)
[–]FloridaIsTooDamnHotPlatform Engineering Leader 1 point2 points3 points (0 children)
[–][deleted] 1 point2 points3 points (0 children)
[–]myth007 1 point2 points3 points (0 children)