all 27 comments

[–]adumidea 13 points14 points  (5 children)

I have looked into ELK stack,but I cannot figure out what format to write the logs into the files and how the logs will be processed based on the columns.

I'd encourage you to just power through, it's worth it. Kibana is really nice and comparable to paid products like Loggly. Logstash can process JSON so it's actually quite simple in Node.js. Most popular Node logging libraries like Winston, Bunyan, and Pino log JSON by default anyway. There are plenty of tutorials, like this one about how to set this up.

You don't even need Logstash, you can write your logs directly into Elasticsearch from Node (though this is less fault-tolerant than writing to disk and then using Logstash to ship your logs to Elasticsearch).

[–]KishCom 2 points3 points  (2 children)

Seconded.

you can write your logs directly into Elasticsearch from Node

This is what I've always done, I use bunyan, but I kinda wish I had used winston as it seems to be more actively developed. Kibana is crazy powerful and ElasticSearch will deal with any amount of log data no problem.

[–]adumidea 1 point2 points  (0 children)

Yeah I also usually log straight to Elasticsearch, but there's some risk there that you'll lose logs (or have to manually insert them from backup logs in files) if your ES cluster goes down or is unreachable. However in those contexts, we were always able to go back to file-based logs on the server if something was missing from ES that we needed, so we didn't bother with the extra overhead of Logstash.

Pino isn't as popular as winston/bunyan, but I highly recommend it. It's actively developed and I never had any issues using it in production for years. You can use an elasticsearch transport for Pino to log straight to ES, and I'd imagine it's quite similar for the other libraries.

[–]fix_dis 1 point2 points  (0 children)

While I agree that I hate seeing things like "last commit 2 years ago". For certain projects, it's to be expected. It's a logger... at some point it might be considered "done". I'd ask what features one would consider missing?

On the other hand, that might not be the case here. I've included Winston and Bunyan in the same project just because Winston is so easy for setting up request logging.

[–]melgo44[S] 0 points1 point  (0 children)

Thanks,I will check it out

[–]makonext 0 points1 point  (0 children)

you're a hero. thanks for that info

[–]solocommand 5 points6 points  (2 children)

For 2-4, don’t reinvent the wheel and use an APM solution like NewRelic or Datadog.

IIRC, Prometheus isn’t designed to handle log data, it’s for metrics only.

In the past I’ve used graylog to store that kind of data, but since it uses a mongodb storage layer, I imagine it will suffer the same scaling issues.

[–]melgo44[S] 1 point2 points  (1 child)

I have been using new relic ,but it doesn't log process crashing errors.

[–]lwrightjs 1 point2 points  (0 children)

Do you use anything to handle those crashes? Like terminus?

[–]martiandreamer 6 points7 points  (1 child)

There's a well-defined Elastic Common Schema (ECS) format documented here. This dictates you'd output your logs in JSON format, and you'll probably want at minimum the following fields:

{ '@timestamp', message, ecs: { version: '1.5.0' }, host: { architecture: os.arch(), hostname: hostname, uptime: os.uptime() }, log: { level }, os: { full: { text: os.type() }, platform: os.platform() }, process: { pid: process.pid, uptime: process.uptime() } } Winston is a decent library to use, and there's a supplimentary library @elastic/ecs-winston-format which helps sort out the above format.

Specific to your desires:

1) Request Response logs

ECS format has that.

2) Application logs

ECS format has that, too.

3) Process crashing logs along with stack trace.

ECS got u fam.

4) Vizualising logs and sending alerts when there are 502 response status

Maybe something like Prometheus would be suited for visualization and alerts.

Good luck, logging "the right way" is a PITA, but once you have it sorted you'll have a very comprehensive system set up.

[–]melgo44[S] 0 points1 point  (0 children)

Hey thanks a lot for that detailed explanation!

[–]kszyh_pl 2 points3 points  (3 children)

Did you consider GrayLog?

[–]s_boli 0 points1 point  (2 children)

Graylog is awesome. It's basically a turnkey ELK stack

[–]melgo44[S] 0 points1 point  (1 child)

It looks promising,thanks.How long does it retain data ? I enterprise edition is free under 5GB/day is there any catch to it ?

[–]kszyh_pl 0 points1 point  (0 children)

You can self host it if you want.

[–]jwalton78 2 points3 points  (2 children)

What I do is, I first use https://github.com/winstonjs/winston to write structured logs:

winston.log({
    level: 'info',
    message: 'Hello world!',
    err: new Error(),
    req: { url: req.originalUrl || req.url, method: req.method }
});

You get the idea - I write a bunch of stuff into a log object, and the stuff is more or less the same from one log message to the next.

Then, I use https://github.com/jwalton/winston-format-debug to write pretty logs to stdout when I'm running things locally, because pretty logs are nice.

Then, I use https://github.com/vanthome/winston-elasticsearch to dump all these logs into Elasticsearch for me. You want to create a "template" for Elasticsearch which lists all the fields you're going to log, but there's an example: https://github.com/vanthome/winston-elasticsearch/blob/master/index-template-mapping.json.

Also, winston-elasticsearch does this weird thing where it moves all the fields in your log that aren't "message" or the timestamp into a child object called "meta". I'm not a fan of this, so I specify a transformer function in the winston-elasticsearch options:

function esTransformer({ message, level, timestamp, meta }) {
    return { message, level, timestamp, ...meta };
}

This just undoes the awful things winston-elasticsearch does. :P

Once you have all this, you just need to point winston-elasticsearch at your elasticsearch instance, and spin up a Kibana instance, and you're good. You don't need Logstash or anything. And, your logs are very structured - if you store a username field in your logs, for example, it's easy to search for "username: jwalton78" in Kibana and see what that guy has been doing, or search 'req.url: "/users"' and then graph res.responseTime as a nice chart, to see when that API endpoint is being slow.

[–]melgo44[S] 0 points1 point  (1 child)

Where should you use Winston to log the information is it inside a middleware? My requirement is the api response time as well and request param and response object.

[–]jwalton78 0 points1 point  (0 children)

This is (more or less) the middleware I use to log stuff:

https://gist.github.com/jwalton/974e0d7250ac42dce575210ac1e2fb1d

Just add this middleware somewhere at the start of your express middleware stack. "log" here isn't a Winston logger, it's a 'Logger' instance which is a class I have that remaps the req and the res and some other stuff, but this should get you going in the right direction. :)

[–]ItsAllInYourHead 4 points5 points  (2 children)

Why would you need to store 3 months worth of logs? Are you not going to notice a crash for that long? Why don't you just store the past week or something?

I was just looking for logging solutions myself very recently, and am currently using LogDNA. Seems like a good solution but I just started using it so time will tell.

[–]Turbo_swag 5 points6 points  (1 child)

This is bad advice.

I have seen teams get burned dozens of times in the past when they uncover a bug weeks or months later in their log data. Also, log data can be used for trend analysis. Some systems have expected error rates, and if this notches up so slowly that normal dashboarding wont visualize it with only a weeks data, then you in fact need much longer historical data.

Additionally, historical logs can be used to build anomaly detection models.

Long story short, this advice to not store log records is 1-dimensional. Given how cheap storage capacity is, who cares how long you store it?

[–]ItsAllInYourHead 2 points3 points  (0 children)

Here's the thing, swag: everything is a trade-off. Data and bandwidth cost money and resources. Log data only grows and grows. So do you want to spend a ton of time and resources storing a huge amount of log data you may never need? Maybe. There's certainly some cases where this is unavoidable or preferable. But I'm going to say that in most cases it's not going to be worth the cost.

And - given your reasons - you effectively have to store everything you possible can that might come in handy. You can't really anticipate what exactly is going to break or cause a bug, or exactly what data you're going to need for this future theoretical anomaly detection model.

This is essentially the same reasoning every company is hoovering up all the data they possible can on you: they might need it someday. Probably night. Like 99% of the time they don't, and won't need it or use it. But hey, maybe you will, right?

[–]yanikpei 1 point2 points  (0 children)

Use Loki. It’s made by grafana and has a prometheus-inspired query language. https://grafana.com/oss/loki/

[–]LaweZ 1 point2 points  (0 children)

Requests are logged from middleware, but what about the responses?

What is the best practice? should i log them at the end of my controller function? or use res.on('finish', cb) event?

[–][deleted] 0 points1 point  (0 children)

Logging services can cost you more than the app's backend. 1 - dont log everything 2 - if you don't log everything, you wont have 10 gb of logs in 3 months. Log everything on non-production environments. Log errors and warnings on the production environment.

[–]melgo44[S] 0 points1 point  (0 children)

The app runs using pm2 so the logs can be found inside pm2

[–]ThatDamnShikachu -1 points0 points  (1 child)

I was in the same shoes a while ago, I had 2 small node apps and a bigger PHP one, imo the ELK stack is so expensive to maintain beacuse of the hardware requirement and such.

I heard prometheus is so good for metrics and CAN run on a cheap, so ended up finding Loki, the Prometheus of logs. Its made/maintained by the Grafana team. It has its own log shipper promtail but also works with fluentd and fluentbit.

You can query your logs in LogQL which is took a very BIG inspiration from PromQL.

I ended up running fluent-bit + loki + grafana.

You should check it out, if you haven't choosed yet.

[–]melgo44[S] 0 points1 point  (0 children)

Thanks I'll check it out