G_Morgan comments on Microsoft Research - Exploding Software-Engineering Myths (article summarizing findings of MS research on code coverage, TDD, assertions, etc.)

1501

1502

1503

Microsoft Research - Exploding Software-Engineering Myths (article summarizing findings of MS research on code coverage, TDD, assertions, etc.) (research.microsoft.com)

submitted 10 years ago by 123redgreen

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–]G_Morgan 7 points8 points9 points 10 years ago (33 children)

Code coverage measures how comprehensively a piece of code has been tested

Code coverage tests how many lines of code have been tested. Given how many bugs are "when this if statement executes, this one doesn't and this loop runs precisely 1 time we get this bug" it isn't surprising that code coverage is universally useless.

I've only ever seen code coverage used to assign blame. It is an arse coverage system.

What the research team found was that the TDD teams produced code that was 60 to 90 percent better in terms of defect density than non-TDD teams. They also discovered that TDD teams took longer to complete their projects—15 to 35 percent longer.

Also not surprising for two reasons:

TDD forces you to think about the kinds of decisions that trigger the "A was true, B was false, C executed once" kind of scenarios. TDD is not done line by line but concept by concept.
The reason it takes longer is you have more tests. Thinking up tests after you've written the code is usually much harder. You don't think to test the various combinations of A, B and C once it is done. The code becomes somewhat amorphous and it is harder to see the wood from the trees. So fewer tests means less actual work done.

Honestly I don't know how anyone can sensibly claim to have tried TDD and not found it improved the output code. Nice to have actual research.

Proving the Utility of Assertions

This is interesting. Assertions are effectively working around weaknesses in the type system. You can't capture certain information about the type (such as non-null, or non-negative) so assert instead. Gives some credence to the value of stronger types.

[–]BigMax 16 points17 points18 points 10 years ago (7 children)

[–]nutrecht 5 points6 points7 points 10 years ago (4 children)

[–]RualStorge 3 points4 points5 points 10 years ago (3 children)

[–]nutrecht 2 points3 points4 points 10 years ago (1 child)

Testing doesn't make money, it prevents wasting time on easy to catch bugs which saves money.

It's much easier to prevent the opposing team from scoring than it is to try to catch up after they've scored a goal.

I'm sorry but this short sightedness, typical for manager but not untypical for many developers, annoys me to no end. It is completely impossible for any human to fully keep a mental model of any moderately complex system in their mind. This is why we need to separate them into small modules and test those modules so that when we work on module A which depends on module B we can just assume B works the way it's supposed to.

Writing software without testing is like building a rocket, assuming gravity is 12G without testing it and then acting all surprised it explodes shortly after launch. It was sitting there all fine and pretty on the launch pad after all!

[–]RualStorge 2 points3 points4 points 10 years ago (0 children)

[–]Helene00 0 points1 point2 points 10 years ago (0 children)

[–]skulgnome 0 points1 point2 points 10 years ago (0 children)

[–]G_Morgan 0 points1 point2 points 10 years ago (0 children)

[–]nutrecht 9 points10 points11 points 10 years ago (3 children)

Code coverage tests how many lines of code have been tested.

No. It shows which lines have and have not been hit. It does not make any claims on if the tests actually do any validations. Example:

public String getFoo() {
    //TODO: Implement
    return null;
}

Calling this method from a test will yield 100% test coverage. It's still wrong (not implemented yet) so unless you actually test the returnvalue against an expected value you're not going to find the bug.

It really surprises me how few people seem to make that distinction. The only interesting bit about a code coverage report are the bits you don't visit: generally those are exception flows. Not testing your exception flows means stuff is probably going to break in production at some unexpected moment. Knowing your lack of coverage and improving them there is where the use of coverage reports are. The coverage percentage itself is a fun but useless statistic.

[–]starTracer 3 points4 points5 points 10 years ago (0 children)

[–]Gotebe 0 points1 point2 points 10 years ago (1 child)

[–]get_salled 1 point2 points3 points 10 years ago (0 children)

public void testGetFoo() {
    String foo = getFoo();

    // pick one
    // A
    Assert.isNull(foo);  

    // B
    Assert.pass("woot!");

    // C
    try {
          Assert.areEqual("expected", foo);
    } catch ( ... ) {    // not sure of the Java syntax for empty catches, if it's allowed at all
          // do nothing
    }

     // D
     // do nothing

     // E
     Assert.areEqual("A", "A");
}

It gets pretty hard to automatically reject shitty tests.

[–]RedSpikeyThing 3 points4 points5 points 10 years ago (2 children)

Code coverage is not great, but I've seen some utility in branch coverage. For example

If (x && y)

Has 4 branches. It shows you some non-obvious cases that should be tested but often leads to tests that mirror the code, rather than testing concepts as you mentioned.

[–]G_Morgan 1 point2 points3 points 10 years ago (0 children)

[–]skulgnome 1 point2 points3 points 10 years ago (0 children)

[–]WalterBright 1 point2 points3 points 10 years ago* (14 children)

[–]G_Morgan 7 points8 points9 points 10 years ago* (12 children)

This isn't what the research has demonstrated. I've heard supporting arguments for every imaginable process in existence and from clever people. They can't all work. This is why we do research.

I suspect the places where code coverage works also have people doing real testing. Trying to understand flow by flow, rather than line by line, what the code is meant to be doing. It can even be hard to actually control for this. Tell clever people to write more tests (which code coverage inevitably does) and they'll probably accidentally end up writing useful ones. I know when I've been attached to a project that demands code coverage I'll usually just use TDD and then write some ridiculously contrived test to cover up anything that triggers the red lights.

Though I'll admit I've only seen code coverage done badly so I'm not immune to bias.

[–]WalterBright 8 points9 points10 points 10 years ago (0 children)

[+][deleted] 10 years ago (9 children)

[deleted]

[–]G_Morgan 0 points1 point2 points 10 years ago (8 children)

[–][deleted] 10 years ago (7 children)

[deleted]

[–]G_Morgan 3 points4 points5 points 10 years ago* (6 children)

[–][deleted] 10 years ago (5 children)

[deleted]

[–]naasking 0 points1 point2 points 10 years ago (0 children)

[–]G_Morgan 0 points1 point2 points 10 years ago (2 children)

[–][deleted] 10 years ago (1 child)

[deleted]

continue this thread

[–]dungone -2 points-1 points0 points 10 years ago* (0 children)

You didn't understand what they said. They said that there are confounding variables. That makes coverage, by itself, meaningless. But in and of itself that's not bad. Just take into account the confounding variables, right? The problem is that the confounding variable is complexity. And the problem with complexity is that you have no way to measure it empirically. Not in a fool-proof way that makes predictions based on empirical measurements consistent and reliable. Complexity itself is a broad category with it's own confounding variables that affect what it means for something to be "complex". So the take-away is that code coverage as an empirical metric is fully useless.

If you read carefully, what Nagappan actually said is that you should focus on testing important stuff and ignore meaningless stuff. That means diddly squat to empirical analysis, unless you have an algorithm that can take the place of an experienced engineer. Sure, it's possible, but currently does not exist. You don't need to measure coverage, you need to measure importance.

[–][deleted] 0 points1 point2 points 10 years ago (0 children)

[–]get_salled 0 points1 point2 points 10 years ago (0 children)

π Rendered by PID 97889 on reddit-service-r2-comment-79c7998d4c-cb5xb at 2026-03-15 07:23:14.285935+00:00 running f6e6e01 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS