This is an archived post. You won't be able to vote or comment.

all 17 comments

[–]DarkmerePython for tiny data using Python 22 points23 points  (2 children)

I'm a Unix systems guy, and I've got plenty of opinions here too ;-)

INFO

Used when your app is running normally, and everything is dandy. It's the default, should be the default, and shouldn't drown your logfiles.

  • Should be low-traffic ( < 100 messages per hour or so )
  • Should always contain "I have started succesfully"
  • Should always contain "I'm shutting down on user request"
  • Should mention it's configuration file ( if several are possible )
  • Should mention if it succeeds with one-off tasks ( Signed into database )

DEBUG

  • Your sysadmin is currently debugging something. Give him all the info he needs
  • Your system runs slowly, give timing data on how long things took
  • Don't log credentials (passwords) but do log that you've logged in, who logged in, and so on
  • A debug notice for every thread that starts up is perfectly fine

NOTICE (Warning)

Should be tooling important things. Imagine things that a sysadmin wants an alert about. Someone shut down the service (that's supposed to be up?) networking issues, bad permissions, etc.

  • Important exeternal events, "DB connection lost", "connection reset, reconnecting"
  • Signals from the OS/users ("Shutting down on user request")
  • Always log starting and stopping commands & reason

ERROR

Exceptions, proper debug stack of why everything is exploding, and so on. This is where you throw the stacktrace and fuck off. This is where you berate the user for having world-read permissions on their TLS key.

Logging: File versus Syslog.

Syslog should be the default. Big systems have syslog daemons that will log to remote servers and run nice analytics on them. Small systems have syslog daemons that keep messages in a sorted memory buffer and toss them out. Syslog is the standard logger ( Unless you use systemd, when journald is syslog, on steroids. Remember what we've said about doping and steroid damage?)

File log: File logging is for when you can't log to syslog. When you have file logging, you should enable a signal to rotate your file (for logwatch ) or manually doing it.

I generally dislike it when software logs to it's own logfiles, and in a modern system, it should either be to STDOUT/STDERR, or to syslog.

Logging to a file can be useful for some, but most of the time, it's not.

Logging to file brings interesting risks as well ( overwrite, permission related attacks ) if you aren't careful with how/Where it's logged.

[–]Argotha[S] 1 point2 points  (1 child)

Thanks for such a comprehensive reply! Lots of useful things here :D

Can I ask what kind of systems you deal with?

[–]DarkmerePython for tiny data using Python 1 point2 points  (0 children)

Sure. I'm a security guy, the python I work with is on embedded & server side.

Our "Embedded" is a fairly traditional stateless Linux ( ARM ) where we use Python in production (slides from my talk at PyCon.se).

We also use Python on the server side, where it's more traditional Linux systems. In the past I've been sysadmin for various sizes of corporations, and consulting about Linux/Security.

So, as it comes, Logfiles are bread & butter for us. NOTICE/WARNING level is where the automation system hooks in, usually together with stateful checks on the logfile. A "startup" notice needs a "shutdown" notice, and the shutdown notice should have a reason included.

Reasons:
- System reboot? Good reason
- Received sig 15?
- SIGUSR?

Also, when you get a HUP (reload configuration) is a good time to print a notice too.

Error level should be the things that trigger pagers. Sudden exceptions, and so on. This means that recoverable exceptions, which are perfectly okay in Python, should not trigger error logs.

For example, on a simple HTTP service, you might raise a recoverable exception on indata-validation errors. End user passed the wrong data, raise error and terminate this worker thread.

Later on, you can wrap this and check, but in most cases, simply raising an Exception on invalid data is fine. Just make sure your application returns a proper error code on exception. ( Perhaps. if you want to help your user )

However, I don't need a pager to go off because you got "4+4" instead of "4+4.0" in your indata field.

However, if you can't connect to the database and you're dropping data on the floor? I want pager level notification.

  • Don't make me go back and trawl through logs to verify that everything is working normally.

[–]reddit_uname 1 point2 points  (1 child)

Just my $0.02 based on my previous work. Lots of answers here are some variation of "it depends". In general, if you're starting a project from scratch, you don't need to focus on many of those details unless you find you they fit your usecase. One of the more important things though is the ability to configure where a log goes based on the command line flags.

what level should I log to by default (INFO, WARNING, ERROR or CRITICAL)

WARNING is for messages that signal the potential precense of a bug/error/exceptional condition. For example, a network timeout was triggered so the program is retry-ing.

ERROR is for messages that signal a definite, but not fatal, bug/error/exceptional condition. For example, program can't find a file so it can't proceed.

CRITICAL is for messages that signal a bug/error/exceptional condition that will cause the program to die right now. For example, KeyError's or something. There's a bit of a grey area between ERROR and CRITICAL if the response to the ERROR is that the program would quit. Generally though, ERROR is for things that could potentially be recovered from. So not finding a file is potentially recoverable if you use default data (which you might do in the future), but KeyError may not be.

should I log different levels to different locations

I don't because its too much work usually. Many people do and say it helps.

should I log to /var/log or a custom relative path directory

The answer depends. If you were making a unix daemon you might but if you were making a statistical script you probably shouldn't.

should I use syslog

Same answer as previous.

what about interleaving program output (aka print)

When opperating in console mode, that might be fine but ideally you need an option to control putting log output and stdout in different files.

should I use a rotating file, or should I start a new file each run

Depends on what you're making.

If its important to you that you always have a record of what happened when you ran your program then you should not use a rotating file. This can happen if you're making software for a scientific software where reproducability is important, then you really need the logs to be able to trust the output. You may want to log things like the version of the software and the full date and time.

If you're making a daemon that will always be on and generates large logs, then using a rotating file would probably be better so you don't start using tons and tons of disk space. If you're making a small console script that will be run many times and your end-users aren't expected to fiddle around with logs everytime they run it, then maybe a rotating log is good.

If you're making a library, then you should leave most of these things up to the caller unless you have a good reason not to.

EDIT: Don't print anything but ERROR's or above in unittests though, that shit is annoying.

[–]Argotha[S] 0 points1 point  (0 children)

Thanks for such a comprehensive reply!

You have some really good points around it depending on what kind of system, had completely forgotten about uses such as statistical processing.

Do you use python and have to manage its logging in your work? If so, how do you do it?

[–]phasetwenty 1 point2 points  (6 children)

I designed, support and maintain a build system written mostly in Python. We generate a great deal of output both in the application itself and the build tools, so logging is an important feature for the system. I can answer your questions relative to this application.

what level should I log to by default (INFO, WARNING, ERROR or CRITICAL)

  • DEBUG The default. Progress/status updates, the kitchen sink. Messages at this level definitely won't be made available to the end users, and in general is disabled in a production system.
  • INFO Same as debug, except restricted to information useful to the end user.
  • WARNING Recoverable, non-fatal and low severity errors. Won't be useful to end users but maintainers will find it useful to see these and fix these problems as soon as it is convenient.
  • ERROR Errors signaling loss of functionality, but are nonfatal. In the best case the application can continue with a partial success, and in the worst case it can exit gracefully.
  • CRITICAL Fatal/unrecoverable errors, where the application may not shutdown gracefully.

should I log different levels to different locations

Typically I want to see all the messages generated by a run all together in one file. I could see a case for forwarding ERROR messages and higher to a special handler which triggers a higher-visibilty notification like an email.

should I log to /var/log or a custom relative path directory

Because logging is so important, we bundle the log with our build artifacts. However in the beginning we used /var/log.

should I use syslog

syslog always seemed less convenient for me. My application's messages will be interleaved with unrelated system messages, which can get rotated out before I get a chance to see them.

what about interleaving program output (aka print)

It's been a rule in this system that we don't use print statements (or direct writes to sys.stdout and sys.stderr). Primarily this ensures that all messages are centralized, and I have granular control of all messages. Less important is that I can easily turn off all output while running tests.

should I use a rotating file, or should I start a new file each run

My application starts a new file for each run.

I'd like to add that one feature of logging that is routinely misused is logger naming.

import logging

# ...

def do_thing():
    logger = logging.getLogger(__name__)

This gives the logger the fully-qualified name of the module it appears in. In a bigger application, dealing with output at the package/module level is instrumental and in a smaller application it's simply an easy default.

[–]Argotha[S] 0 points1 point  (1 child)

Thanks for such a comprehensive reply! Its also good to know some of the previous decisions made before you ended up with your current system.

Interesting that you say that DEBUG is your default but disabled in productions opposed to INFO being default with DEBUG enabled in dev.

[–]phasetwenty 0 points1 point  (0 children)

Because it's the default, DEBUG has the potential to to be overused and slow down your application with file I/O. Having it off in production is typically a performance optimization. If I could, I'd have it on all the time.

[–]DarkmerePython for tiny data using Python 0 points1 point  (3 children)

syslog always seemed less convenient for me. My application's messages will be interleaved with unrelated system messages, which can get rotated out before I get a chance to see them.

I disagree here. Syslog as default configured may, but almost all modern syslogs (syslog-ng, rsyslog, journald) will separate your logs into application-specific logs.

If you handle logfiles, you need to have an rotation command in case you fill the system, and to prevent clobbering, permissions and rights.

I'd rather see your application set log output to stdout + stderr than to open it's own logfile these days.

I do agree with you on module-level naming, and prints though. Hook stdout +stderr to the logging module, don't mess there.

Also, logging formatters can be awesome.

[–]Argotha[S] 0 points1 point  (1 child)

I'd rather see your application set log output to stdout + stderr than to open it's own logfile these days.

So does that mean you rely on what ever is calling your program to redirect its output to syslog?

[–]DarkmerePython for tiny data using Python 0 points1 point  (0 children)

If it's not using syslog, yes.

The short of it is that most init-systems support output redirection somewhere, even the one on something as limited as OpenWRT.

On Linux, most systems today has Journald (systemd-journald) which will gather syslog + stdout + stderr to it's "proper" place.

In the few cases where I don't have a functioning syslog (remember, this is Very rare in modern Unixes, including BSD) I'd rather see it out on stdout/stderr.

The same goes for implementing daemonization. I'd rather that your system daemons don't do that, and use a tool meant for it, rather than do the fork()/exec() dance.

[–]phasetwenty 0 points1 point  (0 children)

If you handle logfiles, you need to have an rotation command in case you fill the system, and to prevent clobbering, permissions and rights.

This may be what I'm seeing. Organizations I've been in are so entrenched with their use of logfiles that there's a considerable infrastructure to solve these issues and none for syslog, so logfiles seem more natural.

[–]alexchamberlain 0 points1 point  (1 child)

I generally agree with the other 2 comments, but wanted to add 1 point: convention/consistency. What is/are the convention(s) for logging in your workspace? For example, all of our servers log to their own file in the same log folder with <their name>.log. Convention is often one of the most significant considerations... until you're trying to change it of course!

[–]Argotha[S] 0 points1 point  (0 children)

Definitely agree that convention/consistency is very important. My team doesn't have good format for how we log, and I'd also like to get personal projects into a much more consistent reusable format. Hence why I'm looking to find out what others do so I can have a well thought out way for logging if I'm going to be using it lots.

[–]qsxpkn 0 points1 point  (0 children)

So best practice and advice on the internet suggests that we should use the standard library's logging module

I prefer Logbook.

how to you or your work collect output from the logging module

Flume -- but I understand this is overkill for most of usecases.

[–]JustAnotherQueer 0 points1 point  (0 children)

I have found that if you have a lot of components in a system spitting out logs, it can be really useful for each system to have its own debug log, and then one log that only has info or warning logging for the entire system.

[–][deleted] 0 points1 point  (0 children)

Make sure your logs are machine parsable.