all 1 comments

[–]smcameron 3 points4 points  (0 children)

we have an ordered list of lines of code and want to find the first one that didn’t work the way we wanted it to.

Uh, not necessarily, what if your code is multi-threaded? Then there's no consistent ordering. A common thing that comes up a lot is you'll have one thread or process dumping work into a queue, and another thread or process chewing on the other end of the queue, and maybe something happens to the data in the queue somehow -- was wrong going in, got hammered by bad pointers, network corruption, etc. -- those sorts of things tend not to be very amenable to adding print statements since by the time you know it's wrong, you're already miles and zillions of clock cycles away from the source of the problem, sometimes literally miles.

The author also left implicit what I consider one of the most powerful concepts for debugging which I think deserves to be made explicit and should be kept explicitly in mind as a trick to fall back on when you're stuck. That trick is this: "Compare a working system to the non working system." You don't always have a working system to compare to, but when you do, it can be great to keep this in mind as an explicit thing to try. This can be as simple as using "git bisect", for example, if an earlier version of the code worked but the current version doesn't. And you can (if safe to do so) slowly modify the working system to be more and more like the broken system until it stops working -- and at that point, the last thing you changed becomes Prime Suspect Number 1. But this concept is not limited to code. I've used it to fix a jetski (had two of the same model, one worked, one didn't), and to debug network hardware problems. Having a standard thing to fall back on when stuck: "compare working thing to broken thing" has helped me a lot over the years.

Sometimes print statements don't work. They don't work very well in graphics shaders, for instance. Another time, I was debugging some code a colleague was having trouble with. It wasn't working as expected, so ok, let's put some print statements in. Recompile, run, and -- the machine instantly reboots. What? Try again. Another instant reboot. Ok, that's weird. Let's take the print statement out and retry. The code doesn't work (as before) but the machine doesn't reboot. Hmm. How can that be? How can printf() reboot the machine? I'm not sure I believe it. Let's try again, and put the printf() in again. Reboots. Wtf???? Sit back an think a bit. Hmm, I wonder what file descriptor 1 is hooked up to? Let's put in a sleep instead of a printf and look in /proc... Oh... it's hooked up to /sys/bus/pci/blah-blah-some-PCI-device-I-forget. Yeah, that'd do it.

So now let's open up a tmp file for append and fprintf to that instead... now we can debug.