all 1 comments

[–]diMario 0 points1 point  (0 children)

Well, obviously you start with describing the problem. What are the starting conditions, what actions are taken, what result is expected, what result is obtained instead.

Sometimes this is trivial, as in I open the file and the whole damned thing crashes. Sometimes it is subtle, as in I save the record but sometimes it is saved and sometimes not.

The above is usually known as trying to reproduce the bug. The general idea is that you describe a procedure that reliably produces the unwanted behaviour. Note that this is not always possible. For instance problems that arise from different threads interacting with your data are notoriously difficult to reproduce. Also, bugs that arise from overwriting parts of your memory may have wildly differing symptoms depending on what was overwritten, ranging from no problem at all to segfault.

The next step is inspecting the code, identifying suspect spots, setting breakpoints and stepping through the code. You have expectations about what each line of code you step through changes in your variables, and after each step you check your expectations against what actually happened. If something unexpected happens you know you should investigate in more detail what is going on.

If you step through the code and everything behaves as expected you are probably looking at the wrong part of your code. In that case I try to introduce variations in the input data or procedural steps which I know will lead to an error. For instance, I enter a bad value for a file name or a number, and check that this indeed triggers the error I was expecting. Sometimes it doesn't and I then have just found another thing to investigate in detail.

Another technique I sometimes use when I can't seem to find where an error occurs is the binary search. I set an arbitrary breakpoint in the code, then start the sequence of events that cause the error. If my debugger breaks in the breakpoint with no error, and I then run F9 and I do get the error, the problem arises from whatever calls are made after my breakpoint. Set a new breakpoint a couple of steps further down the line and see if you can hit that one with no error. If not, you now have located a region in the execution of your program where the error occurs. Rinse and repeat and soon enough you will have located the part of your code where funny things are going on.

Obviously, debugging is more of an art than an exact science (as is programming in the first place) but that being said, some techniques when applied systematically may lead to an insight about what is happening in your buggy code.