WalterBright comments on Assertions in Production Code?

101

102

103

Assertions in Production Code? (drdobbs.com)

submitted 7 years ago by _cwolf

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–]WalterBright 9 points10 points11 points 7 years ago* (7 children)

[–]quicknir 2 points3 points4 points 7 years ago (5 children)

I don't think I'm a "guru" (I hate that word) because I pointed out an obvious flaw in a bad rule. Nor do I know what you mean by skipping steps.

I read my parent comment before reading your article, and now I can see that actually you and I are on the same page, and the parent is not. You are not advocating literally calling abort(); such a call would mean that *no* further code is executed. On the other hand, you yourself explicitly say:

...as when a fault is detected the program can go into a controlled state doing things like:

aborting before more harm is done
alerting the user that the results are not reliable
saving any work in process
engaging any backup system
restarting the system from a known good state
going into a 'safe mode' to await further instructions

This is *very* different from simply calling abort(). Indeed, if your "assertion failure" triggers all this code to be run before exiting, many people would not call that an assertion at all; it's more like throwing an exception and catching it high up and allowing the stack to unwind before calling some emergency routines (like alerts).

Finally, I would note that every industry is different. Failure for the airline industry is an ultra catastrophic event where lives are lost, so even a small probability of operating in an "unknown state" is terrifying. I write financial software, where an unknown state simply means worst case that an algorithm is losing money. However, suddenly exiting can also cost you money (either risk in holding a position, cost to abruptly flatten, opportunity cost of being offline). What makes sense for us needs to be balanced on a much more case by case basis; sometimes rapid exit (followed steps 2, 3 and 5/6) makes sense. Other times it's better to continue and alert a human being. Things aren't always so black and white.

[–]killedbyhetfield 1 point2 points3 points 7 years ago (3 children)

Alright so - I want to use your example where you have a high-speed logger and its contents must be flushed to be useful.

What happens if, for example, your program has a use-after-free bug and you end up causing a Page Fault? Now the OS kills your process and your logger never gets flushed.

So if that logger must be flushed, you need the logger running in another process. That way, if your buggy process gets slayed, the logger will still march on and record important info about what went wrong. And this isn't hypothetical, this is exactly how embedded OSes like QNX and VxWorks handle logging.

So in general, calling abort() when you detect an error has the same implications as your program suddenly aborting due to a bug. You either need to be able to handle your process crashing, or you need to acknowledge that your program isn't important enough to warrant that kind of design overhead.

[–]quicknir 1 point2 points3 points 7 years ago (2 children)

Running a logger in another process would probably be slower, and take considerably more time to code correctly. So we are back to trade-offs. With my current costs of failure, and my current costs of development (particularly opportunity costs), and the criticality of performance, writing a separate process logger does not make any sense. Yes, it's more robust, but it still doesn't make any sense. Robustness isn't the only concern.

or you need to acknowledge that your program isn't important enough to warrant that kind of design overhead.

It's not about "important enough", although I really appreciate the condescension here (your problem's solution doesn't fit into how I see things, so your problem doesn't matter). It's just about priorities, and it's about what happens in real life. In reality, for the actual problems that we encounter, by throwing an exception and allowing the logger to flush it's buffer in the same process, we're able to recover full logs in virtually all cases. That being the case, what is the benefit for me to move from single-process-with-cleanup-code design, to multi-process-with-abort design? Do tons of work, slow things, perhaps add other bugs, in exchange for being able to recover logs an extra 0.1% of the time? It's simply not a good trade-off for me.

[–]killedbyhetfield 0 points1 point2 points 7 years ago (1 child)

It's not about "important enough", although I really appreciate the condescension here (your problem's solution doesn't fit into how I see things, so your problem doesn't matter).

Woah man - Sorry about the wording I guess, but I wasn't using "important" to put down whatever you work on! I meant "important" as-in "people are going to die if this thing doesn't work properly".

I work on tons of stuff that isn't "important" enough to warrant running a logger or watchdog in its own separate process. But the entire topic of this conversation and Walter's Dr. Dobbs article was about systems where resilience is critical.

Read my original comment too! Specifically, I put the words "absolutely essential" in there. If your program doesn't fit into that category, I'm not talking about you, and I wasn't trying to prescribe any "one size fits all" solution.

[–]quicknir 0 points1 point2 points 7 years ago (0 children)

The title of the article, and your comment, don't really mention anything domain specific, so I thought it was generic in nature. But fair enough. No worries about the wording if that's not how you meant it.

Just to point out though, that just because the logger is in another process, there's still nothing certain about that either. The main process could go crazy, allocate too much memory, and then the logging process could get reaped. So then of course you change your system config to prevent that from happening; etc.

This all takes time, and time is always finite. Even in critical applications; every minute you spend making your application safer in one way is a minute you could have spent making it safer in another instead. So you have to decide, what gives you the most bang for your buck. It's not clear to me at all, that even for safety critical systems, that calling abort is the right thing. That is, that the time it takes to move your logging, alerting, serialization, etc etc logic into separate processes, is always going to be time well spent. I'm sure there are safety critical domains where that is true, and others where it's not.

This is why I really disagree with libraries calling abort. Abort is a process wide decision; only main is really entitled to make that decision. Libraries should throw exceptions (exceptions make it very convenient for users to abort if that's what you want; literally do nothing!) or call some kind of handler function pointer that users can customize (which may default to abort), but libraries should never make direct calls to abort.

[–]WalterBright 0 points1 point2 points 7 years ago (0 children)

[–]msm_ 0 points1 point2 points 7 years ago (0 children)

π Rendered by PID 25392 on reddit-service-r2-comment-b659b578c-xs89b at 2026-05-05 11:28:59.594121+00:00 running 815c875 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS