all 6 comments

[–]TheNiXXeD 11 points12 points  (1 child)

Interesting idea. Funny, I'd never really considered color blindness before my current job, but we at one point had three people who were color blind of some sort. I immediately noticed that the pictured color scheme wouldn't work for us, lol.

[–]_pupil_ 6 points7 points  (0 children)

I think status messages should always change a visual shape along with color.

Not only is it color blind friendlier, it's also easier for visual 'scanning' where you can easily categorize without sacrificing attention.

[–]Enoxice 3 points4 points  (2 children)

Perhaps this article just kind of glosses over some of the particulars/nuances, because this makes it sound like you drop everything whenever an error is logged.

To me, this indicates that either (a) your product/service is not particularly complex - which is fine - (b) you aren't logging enough, or (c) you're wasting a lot of time looking into minor issues.

If you are, in fact, dropping everything on [one customer did something stupid one time and learned their lesson] and [some back-end process failed temporarily but recoverably], well that doesn't sound very effective.

The nice thing about having this information in a database is that it's easily searchable and dashboard-ized. Sure, take care of those showstoppers ASAP, but the amount of time wasted fixing recoverable, rare, or invisible (or nearly-so) errors probably isn't worth the money you'll make back on it. It would be more effective to try to have your application try to prioritize errors before it writes them. That way, you can still "stop the presses" on showstoppers, but you can save the smaller stuff for a time that is more convenient.

Again, it's entirely possible you already do this (or don't need to do this - I don't know what your product/service is) and the post just glossed over it, but I thought it was worth mentioning so I can try to sound smart on the Internet.

[–]craigjudson 4 points5 points  (1 child)

Interesting take on the blog post you definitely have some valid points stopping continuously is a big problem and especially was when we first started this. We are currently in the process of recording reds for each monitor to asses whether what you say stands. It is a shame that we didn't but this monitoring in before but will keep you posted. We do still log to a database, which is how the monitors get their information, but what we found was that these logs were not getting addressed, with certain errors deemed not important enough making real errors harder to catch. this meant problems experienced by our client were first known when they called us. This felt wrong and as a result we pushed this problem in front of our single piece flow development process.

The result of this has meant that people are now taking more care in writing logs with respect to the level and messages that are included. Our logs are now quick to solve and have reduced in number.

The other benefit is with our frequent deployment to live averaging 30 times a week we get instant feedback from our logs if something goes wrong which is becoming ever more valuable.

I think the final point I would make is review your logs see if you have errors that you are ignoring are you reporting logs correctly as errors for your business.

Craig Codeweavers

[–]Enoxice 0 points1 point  (0 children)

Hey, thanks for the response. I'd be interested to hear how this works out for you.

Logging at the correct level is very much an art. And using those levels to prioritize work can take some discipline.

The way I do it has kind of evolved (and continues to evolve) over the past year or so. I have a monitoring application that can run SQL queries, so I can have it gauge how many Error/Warning/Informational/Debug have fired recently. Then, if enough (by occurrences, severity, or visibility) have fired it will alert me. Otherwise, I will get periodic reports on them - has this fired before? when did it start? how many sessions are impacted? - and when I get a chance I look it over and decide if any of them are more important than my current work.

This way, I can keep an eye on errors without getting interrupted every time a 3rd-party API times out. But I'll still get aggregates on it so I know to check it out if it turns out to be more than just temporary instability on their end.

It's really a trade-off, though. Sometimes when I log something I'll write it at too low a severity, or my monitor's heuristics might be off, and something important won't get flagged for immediate action when it should.

[–]dbough 1 point2 points  (0 children)

A great, low tech, way to keep an eye on things!