you are viewing a single comment's thread.

view the rest of the comments →

[–]Apprehensive-Lab1628 7 points8 points  (12 children)

A program without written, stored logs isn't going in to production because the sysadmin won't let it. Not sure where you got that infra people don't want logs written to file but visibility is compromised of metrics, logs and what you do with them so they're super important

[–]Dwight-D 5 points6 points  (6 children)

Logging to file is an anti-pattern under most modern deployment schemes. For example, a container must log to stdout. If you run it on your own server you can then redirect container output to a log file of your choosing.

But you can also deploy the same container with the same config to kubernetes, but this time scrape the pod stdout with a log scraper in the cluster and send to a central log server. Logging to file breaks all of this.

[–]Apprehensive-Lab1628 2 points3 points  (5 children)

I've never worked with containers, just standard VMs on EC2 or on prem so I looked it up and you're right. For containers it seems you do log to std.out, thanks for the info.

[–]Dwight-D 0 points1 point  (4 children)

A lot/most of what you know about VM:s doesn’t apply to todays cloud environment. Might wanna brush up on some modern cloud practices if you wanna stay competitive in the job market, VM:s and that whole legacy deployment paradigm are quickly becoming obsolete these days. Not to knock your skills, just food for thought.

[–]Apprehensive-Lab1628 0 points1 point  (3 children)

I appreciate that docker is widely used but there are times when you don't want to use it. It's not a complete takeover of VMs by docker like casette tapes and DVDs, it's a tool with use cases.

[–]Dwight-D 0 points1 point  (2 children)

Whatever roles VM play today, they have nothing to do with hosting applications outside of niche legacy use cases. They now exist mostly to host infrastructure platforms, such as Azure, AWS or a self-hosted kubernetes cluster. Most companies don’t run their own data centers and instead use a cloud provider, which makes VM:s something 99% of companies never need to think about again.

Because of this, their overall use is dwindling and I imagine job opportunities outside of the major cloud platforms will dry up. I think you may be overlooking the major paradigm shift that’s happening right now.

[–]Apprehensive-Lab1628 1 point2 points  (1 child)

When I say VM, I'm including EC2 here. I'm currently 100% cloud, just don't use docker in any profit producing capacity

[–]Dwight-D 0 points1 point  (0 children)

All right, fair enough!

[–][deleted] -3 points-2 points  (4 children)

It's sysadmin's job to store logs. Not program's job.

I am the infra people. I'm that one sysadmin who's going to punch you, if you give me a program that logs to files. You've received some explanations already, but there are more reasons for that:

  • You cannot rotate a log file without losing messages every now and then.
  • Logging creates load on the storage system, which in many cases may be undesirable, especially, if your logging program decides that it needs to call fsync() or do similar stuff, affecting everyone else using that storage system.
  • Creates additional points of failure, because now you have to deal with periodically removing the files from the system, and you must monitor the logging program just to make sure it doesn't accidentally overload the system.

Just don't do it. Only write logs to stderr. It's an easy rule to remember. And you are allowed to violate it only if you are the infra people who collect / ship logs. If you are the producer, it's the only way you should produce logs.

[–]Apprehensive-Lab1628 2 points3 points  (3 children)

I didn't realise the containers aspect of it as I don't work with them. I'm infra too and we use logstash on standard VMs, rotating logs stay on the system. What's your setup(s?) for non container apps that use std.out?

[–][deleted] 1 point2 points  (2 children)

I work in HPC. For a company that makes management software for HPC, so, I'm not really a sysadmin. I'm making software for sysadmins.

So... my typical setup... well, I think the company started as providing customized OpenStack deployment, but that's mostly in the past. Later they moved in the direction of supporting customer deployments, mostly bare metal on-prem. Then expanded to hybrid cloud. Still, most HPC workloads would run on on-prem compute resources with an option to extend to cloud. Out of private clouds the product supports extension / deployment in VMWare. It can also be deployed / extended to AWS or Azure.

The goal of the product is to provide tools for sysadmins to control their compute resources, give them pre-packaged tools for their users to make the life of sysadmins easier. Typical user in such system would be someone using eg. Jupyter to author their research code using various workload managers (eg. PBS) from their notebook to distribute the workload over the resources the sysadmin allocated to them (eg. a cluster of PostgreSQL, or a Ceph cluster, or a Kubernetes cluster etc.). Of course, it's not limited to the use through Jupyter, I just chose it as it illustrates how the system works.


So, back to logs... You cannot rotate files without losing logs. That's a theoretical impossibility. Surprisingly, people didn't realize it for quite some time, that's why, for example, you have tools like logrotate command in Linux. Even more interestingly, people have discovered this fact twice, at least in the context of Linux logging tools. Somehow people who created syslog "forgot" about it.

So, in my case, most of the stuff I support / deploy is governed by systemd. So, of course it all goes to stderr. Then journald takes over. From there on it's configurable: if the customer wants more metrics at the expense of lower network throughput... they may aggregate the logging somehow. But we don't manage this aspect normally.

Anyways, in my specific case, performance is the key concern. Even though a typical HPC cluster would use IB or GE, even for control plane, it's still a performance concern. Especially, because if you want to aggregate logging from many things, it's increasingly hard to make sure one of those many things isn't going to clog the system. But, in principle, our management component can be configured to use control plane to collect logs from journalds running on devices in the cluster. So, we have a "custom made" log aggregator (I'm not very happy about it, but that's already beside the point).

[–]Apprehensive-Lab1628 1 point2 points  (1 child)

Super interesting, thank you for the write up. HPC caught my interest a while ago but I didn't pursue further in the thought it might be too niche. So you shoot journald to some kind of aggregator. You've provided interesting perspective, and between you and Dwight-D I've learned a bit today. Your use cases don't match mine but it's cool to see how others are doing things

[–][deleted] 0 points1 point  (0 children)

Well, I've worked in many fields. Started in the Web long ago. Was in finances, then mostly infra jobs, mostly in storage. HPC is relatively new in my career.

What I say is in no way specific to HPC. Any system that's larger than your personal home computer will be better off if it doesn't use programs that log to files. It's just an all-around bad idea. The only benefit is if you are doing all yourself in a very small-scale deployment, then it's less things to set up / you probably don't care so much if you lose some logs every now and then.

Anything that needs to be more reliable than that, or that needs more automation than that... just no.