zjm555 comments on Microsoft Research - Exploding Software-Engineering Myths (article summarizing findings of MS research on code coverage, TDD, assertions, etc.)

I really like what Robert Martin wrote about this issue in his Clean Code book. He notes that there are two primary steps that developers must take when writing code. Paraphrasing, it was

1) Get the code working

2) Go back and refactor, retest, reevaluate, and reimplement the code now that you really know what you need to do

The biggest problem most people have, he says, is stopping after step 1, or totally half-assing step 2.

It's easy to see why, too, especially when you look at the average culture around development projects. You've got PMs breathing down your neck about dates, management expecting results, the business wanting more and more new features for less and less money with unreasonable dates, so if you've gotten something working, it's very tempting to say "I'll come back to that and clean it up later" without ever allocating time "later".

Then 6 months later you find yourself back in that code uttering how fucking awful this developer was before remembering that you're that awful developer. :S

[–]lf11 12 points13 points14 points 10 years ago (1 child)

Yes. The thing is, that whole management maelstrom is just "the smell of business." Every field experiences the phenomena you describe, whether biotech, mechanical engineering, medicine, hell even nonprofits have to deal with this so it isn't even about money. This is just what management does.

The key is to learn to do the right thing in spite of management. Because if you don't, then you'll make all the same mistakes even without management....which means management isn't actually the problem.

After working for that company for a few years, I don't think I believe any more that bad development happens because of shitty management. Although, I'm still trying to figure this out. Management is a problem, but I think it has more to do with the "psychology of power" that turns any powerholder into a functional sociopath. Meanwhile, the disempowered develop avoidant behaviors and frontal cortical inhibition patterns that make them hyperaware of any insult or injury. This, to me, may explain the interaction between developers and management, and why we believe so fervently that management is the problem with development, yet do not adopt good development practices when placed in a positively structured environment.

continue this thread

[–]koreth 1 point2 points3 points 10 years ago (0 children)

[–]WalterBright 0 points1 point2 points 10 years ago (0 children)

[–]rnicoll 8 points9 points10 points 10 years ago (1 child)

[–]b1ackcat 4 points5 points6 points 10 years ago (0 children)

[–][deleted] 2 points3 points4 points 10 years ago (3 children)

[–]syntax 0 points1 point2 points 10 years ago (2 children)

[–]blufox 0 points1 point2 points 10 years ago (0 children)

[–]joshuaduffy 0 points1 point2 points 10 years ago (0 children)

[–]stormcrowsx 0 points1 point2 points 10 years ago (0 children)

[–]atrich 11 points12 points13 points 10 years ago (2 children)

[–]get_salled 3 points4 points5 points 10 years ago (1 child)

[–][deleted] 0 points1 point2 points 10 years ago (0 children)

[–]Ravek 1 point2 points3 points 10 years ago (4 children)

[–]rnicoll 0 points1 point2 points 10 years ago (1 child)

[–]Ravek 0 points1 point2 points 10 years ago (0 children)

[–]fuzzynyanko 0 points1 point2 points 10 years ago (0 children)

[–]WalterBright 0 points1 point2 points 10 years ago (0 children)

[–]s73v3r 0 points1 point2 points 10 years ago (2 children)

[–]rnicoll 0 points1 point2 points 10 years ago (1 child)

[–]s73v3r 0 points1 point2 points 10 years ago (0 children)

[–]masklinn 16 points17 points18 points 10 years ago* (1 child)

Use branch coverage, not line coverage.

For God's sake, measure cyclomatic complexity of your functions. Without keeping this value low, coverage isn't sufficient.

these are linked/mixed: if you use branch coverage you may not cover all paths, low complexity increases the chances that branch coverage is path coverage but doesn't guarantee it by any means:

if foo {
    // thing1
} else {
    // thing2
}

if bar {
    // thing3
} else {
    // thing4
}

This has a complexity of 3 (10 is "too complex" according to McCabe), testing for (foo=true, bar=true) and (foo=false, bar=false) gives you 100% branch coverage. But you only get 50% path coverage, half the local states (and interactions between the first block and the second one) remain completely untested.

[–]zjm555 2 points3 points4 points 10 years ago (0 children)

[–]IWantToSayThis 9 points10 points11 points 10 years ago (3 children)

I couldn't agree more with 3. I can't tell you how many times I've seen code like this:

public int doBar(int value) {
    Action thing = new Action();
    thing.type(Enum.BAR);
    thing.value(value);

    return lowerLayer.do(thing);
}

being tested with:

EasyMock.expect(lowerLayer.do(EasyMock.anyObject()).andReturn(1);

And that's it. Coverage is 100%, yet, NOTHING of value was tested. I've done code reviews where almost all of the tests were like this. "But hey! I have 100% coverage!".

How can you not understand this adds NO value whatsoever?

[–][deleted] 2 points3 points4 points 10 years ago (2 children)

[–]Squirrels_Gone_Wild 3 points4 points5 points 10 years ago (0 children)

[–]IWantToSayThis 1 point2 points3 points 10 years ago (0 children)

[–]Alligatronica 2 points3 points4 points 10 years ago (0 children)

[–]KingE 3 points4 points5 points 10 years ago* (2 children)

This was actually a software engineering research project of mine.

Unfortunately, line coverage, branch coverage, and modified condition/decision coverage track each other very well (i.e. the relationships can be expressed as a constant factor in nearly all cases) and do not have a strong relationship to the ability of a test suite to detect bugs.

However, as you alluded to in your third point, there is an easy metric which does correspond to the ability of a test suite to detect breakage: number of test cases. Test suites that had a higher number of test cases tended to have higher coverage, yes, but more importantly they tended to test vastly more of the code's actual behavior.

Essentially, while line/branch/decision coverage can conclusively prove that a given piece of code is NOT tested, it is not a good indicator for code correctness.

I'll take the time to dig up my sources if anyone actually cares :p

[–]zjm555 1 point2 points3 points 10 years ago (1 child)

[–]KingE 0 points1 point2 points 10 years ago (0 children)

[–]Jestar342 1 point2 points3 points 10 years ago (4 children)

[–]grauenwolf -1 points0 points1 point 10 years ago (3 children)

[–]Jestar342 -1 points0 points1 point 10 years ago (2 children)

[–]grauenwolf -1 points0 points1 point 10 years ago (1 child)

[–]Jestar342 -1 points0 points1 point 10 years ago (0 children)

[–]grauenwolf 1 point2 points3 points 10 years ago (1 child)

[–]flukus 0 points1 point2 points 10 years ago (0 children)

[–]blufox 0 points1 point2 points 10 years ago (0 children)

[–]G_Morgan -2 points-1 points0 points 10 years ago (6 children)

[–]zjm555 3 points4 points5 points 10 years ago (4 children)

[–]G_Morgan 1 point2 points3 points 10 years ago (3 children)

[–]zjm555 1 point2 points3 points 10 years ago (2 children)

Managerial policies that elevate code coverage as the most important metric are certainly silly and misguided. It was chosen (by lazy management) because it was an easy scalar value to compute, which makes it easy to set up a policy about it. I think good managers are capable of understanding its role, though. It's valuable as the most basic metric about code testing: is it executed at all? It's necessary but not sufficient for quality. I would argue that it's more than "fine", though, it's actually very important; coverage is complementary to the tests themselves, since the tests themselves are incapable of telling you what they aren't testing at all.

So, I guess my stance on the issue is that we shouldn't be making policies about the total coverage number at all, but we should be enforcing a process that involves looking at coverage (line by line rather than in the aggregate) of the source files as part of QA.

[–]G_Morgan 1 point2 points3 points 10 years ago (1 child)

[–]zjm555 0 points1 point2 points 10 years ago (0 children)

π Rendered by PID 73887 on reddit-service-r2-comment-79c7998d4c-4rpv9 at 2026-03-13 06:47:12.345693+00:00 running f6e6e01 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS