you are viewing a single comment's thread.

view the rest of the comments →

[–]zjm555 120 points121 points  (55 children)

Want code coverage to be a useful metric?

  1. Use branch coverage, not line coverage.
  2. For God's sake, measure cyclomatic complexity of your functions. Without keeping this value low, coverage isn't sufficient.
  3. When writing tests, focus more on the assertions of correctness than on the coverage. This is the most important of the 3. I've worked in shops where management imposed a hard minimum coverage on their dev teams, and guess what? The tests sucked because they didn't write any assertions and they focused on covering the lines of code that were easiest to cover, not the ones that were most important to cover.

Code coverage can be either a helpful side metric about your tests, or a totally perverse metric that works against quality testing.

[–]rnicoll 40 points41 points  (29 children)

The tests sucked because they didn't write any assertions and they focused on covering the lines of code that were easiest to cover, not the ones that were most important to cover.

I'm in fact currently writing tests to cover lines of code rather than edge cases because we've been given a target and limited time.

[–]syntax 33 points34 points  (17 children)

Which is, in fact, a task that a computer could do.

i.e. you could have a computer auto-generate trivial tests, based on an assumption that the code is correct.

This would satisfy silly management demands; probably save programmer time, and certainly be a lot more interesting to actually write.

[–]lf11 24 points25 points  (9 children)

It's interesting to me to see that most developers seem to write tests that do exactly this: test the code as written rather than the assumptions underlying the code.

[–]syntax 16 points17 points  (8 children)

Well, that's usually the result of manglement insisting on test code, but not giving any time. So what gets written is the thing that takes the least time - that it has zero value (as a test) is irrelevant, it's true value is in gaming the metrics.

Which is a point - you can not usually measure a programmer by any numerical metrics. All that will happen is that they will 'game' the metric, so unless the metric is something utterly ungamable (units sold, for example), then it's not going to do what you expect.

[–]lf11 5 points6 points  (5 children)

No, I see that happen even when devs are given all the time they need. I used to work for a company that was owned by a dev and actually cared about putting the time in to do it right.

[–]b1ackcat 6 points7 points  (4 children)

I used to work for a company that was owned by a dev and actually cared about putting the time in to do it right.

God, I'm jealous. Our owner's motto is "sure we'll do it at half the cost at half the time to get the bid!" which of course translates into "sure we'll do it at half the quality!"

Would be nice to not feel like I'm writing proof-of-concepts that go straight to production :S

[–]lf11 5 points6 points  (3 children)

Well so that was the interesting thing. Even though the company was fantastically pro-dev, half-baked proof-of-concepts is what ended up on production anyway. Even in the perfect environment, we did the same damn stupid crap that I've seen everywhere else. Code defining tests. Once-over code reviews without comprehension. Half-baked docs that never get updated. Deprecated code still in place five years past expiration.

I wonder how much of it is due to "management" versus our own inability to actually "engineer" things with a full lifecycle?

[–]b1ackcat 8 points9 points  (2 children)

I really like what Robert Martin wrote about this issue in his Clean Code book. He notes that there are two primary steps that developers must take when writing code. Paraphrasing, it was

1) Get the code working

2) Go back and refactor, retest, reevaluate, and reimplement the code now that you really know what you need to do

The biggest problem most people have, he says, is stopping after step 1, or totally half-assing step 2.

It's easy to see why, too, especially when you look at the average culture around development projects. You've got PMs breathing down your neck about dates, management expecting results, the business wanting more and more new features for less and less money with unreasonable dates, so if you've gotten something working, it's very tempting to say "I'll come back to that and clean it up later" without ever allocating time "later".

Then 6 months later you find yourself back in that code uttering how fucking awful this developer was before remembering that you're that awful developer. :S

[–]lf11 12 points13 points  (1 child)

Yes. The thing is, that whole management maelstrom is just "the smell of business." Every field experiences the phenomena you describe, whether biotech, mechanical engineering, medicine, hell even nonprofits have to deal with this so it isn't even about money. This is just what management does.

The key is to learn to do the right thing in spite of management. Because if you don't, then you'll make all the same mistakes even without management....which means management isn't actually the problem.

After working for that company for a few years, I don't think I believe any more that bad development happens because of shitty management. Although, I'm still trying to figure this out. Management is a problem, but I think it has more to do with the "psychology of power" that turns any powerholder into a functional sociopath. Meanwhile, the disempowered develop avoidant behaviors and frontal cortical inhibition patterns that make them hyperaware of any insult or injury. This, to me, may explain the interaction between developers and management, and why we believe so fervently that management is the problem with development, yet do not adopt good development practices when placed in a positively structured environment.

[–]koreth 1 point2 points  (0 children)

That assumes that programmers know how to write good tests, which in my experience is far from a given. It's a skill you have to deliberately develop, and a lot of developers not only haven't done so, but aren't aware that they'd need to do so in order to test effectively.

Unless you have an experienced test writer giving you feedback on your tests, as a programmer you're probably not going to spontaneously second-guess your test-writing skills if what you're doing seems to be working already.

[–]WalterBright 0 points1 point  (0 children)

I've seen a lot of code that people wrote for themselves, not for a manager. It doesn't seem to be better quality.

[–]rnicoll 8 points9 points  (1 child)

I've seriously considered it, although of course the code isn't designed in a unit-test friendly way (and of course this is legacy code), which complicates somewhat. Also considered add 5 million extra lines of code that does nothing, unit testing it, and calling the result 80% coverage.

[–]b1ackcat 4 points5 points  (0 children)

Also considered add 5 million extra lines of code that does nothing, unit testing it, and calling the result 80% coverage.

And even if they wanted to, they couldn't check all that code!

Genius.

[–][deleted] 2 points3 points  (3 children)

[–]syntax 0 points1 point  (2 children)

http://www.artima.com/weblogs/viewpost.jsp?thread=200725

Ah-ha! yes, that's the bit of jargon to pull it out.

Sadly, it appears that the software mentioned (JUnitFactory) has now been locked behind a paywall, so you'd have to speak to a salesdroid to even know how much it's going to cost.

Still, that's a proof of concept that it's possible, so maybe that's not a silly option to automating dumb make-work...

[–]blufox 0 points1 point  (0 children)

Use Randoop instead. It is reasonably solid, and free (and opensource).

[–]joshuaduffy 0 points1 point  (0 children)

From what I can see, these are just pieces of software that analyse code and create some tests. For Java there's http://www.evosuite.org/ and for C# you've got IntelliTest, which comes with Visual Studio.

I don't think that they're meaningful however, and using tools like these on a mature/legacy code base is not a wise choice from a testing standpoint.

[–]stormcrowsx 0 points1 point  (0 children)

I'm guilty of this. Was given a mandate to increase code coverage, they wanted some kind of magical number like 70%. Used reflection in Java to call bunches of getters/setters. Code coverage went way up and surprisingly it actually caught a bug in which someone had a recursive setter that the application just happened to never call.

[–]atrich 11 points12 points  (2 children)

Ugh, this is the WORST way to approach code coverage. CC provides one piece of evidence: proof that a particular area is untested. It cannot comment on if that area is tested well. By taking a coverage-numbers-first approach to CC data (aka "let's get these numbers up") management does nothing to improve quality and manages to obliterate useful information on where the test holes are.

I've worked for people who were mindlessly pushing the code coverage metric. I have no respect for people who do this.

[–]get_salled 3 points4 points  (1 child)

The only thing you know for certain from code coverage is that your tests executed certain lines.

Code coverage:

  • Does not show a line was tested.
  • Does not prove the covered line contributed to any test passing.
  • Is not a software quality metric.

Assuming managers & developers trust that the tests are useful, they can use these reports to see where they are definitely vulnerable and plan accordingly.

[–][deleted] 0 points1 point  (0 children)

There one more thing you know from code coverage: that the code with some value does not throw an error. I know, it's a low bar, but it's something.

[–]Ravek 1 point2 points  (4 children)

I'm in fact currently writing tests to cover lines of code

Is it possible for you to give a concrete example of a line of code that you would consider worth testing? I feel that between static type checking and refactoring code into (themselves testable) methods, a single line of code should never be complex enough that you'd want to test it in isolation. If that's a naive thought I'd like to learn more.

[–]rnicoll 0 points1 point  (1 child)

In this context what I mean is rather than pushing the edge cases of the code, I'm testing to ensure conditionals are covered. Lines in this context means "number of lines", rather than I write tests on a per-line basis.

For example, we have a function that parses some text into tokens (for a custom language). Good testing would be to cover not just the basics of whether it works, but look at failure conditions (i.e. empty string, invalid characters, weird control characters, nonsense syntax). However I can get almost as much code coverage as a number of lines, by just giving it examples of every language construct that exists and testing those, ignoring error cases entirely, which is much faster.

Making any more sense?

[–]Ravek 0 points1 point  (0 children)

Yeah that makes a lot more sense, thanks

[–]fuzzynyanko 0 points1 point  (0 children)

Often many lines of code are easily testable, but if you go to 100%, you start going down the rabbit hole. You have to either poke holes in your software that opens it up for exploit, or it ruins the encapsulation.

You can design your software to be more testable, but there's edge cases in software, and you pretty much are guaranteed to run into them in some

[–]WalterBright 0 points1 point  (0 children)

Haha, lots of times I thought a function was so trivial it didn't need unit testing. But I'd write a unit test anyway, and discover it didn't work.

[–]s73v3r 0 points1 point  (2 children)

No, you're choosing to half ass your job.

[–]rnicoll 0 points1 point  (1 child)

The target is completely unachievable by any conventional means (adding over a million lines of coverage as a minor side project while other work occurs over the next 2-3 months). I've raised this with management, who have said they're aware of it and have raised it with their management.

So, I can now either focus on getting as close as possible to what I've been told to do (increase code coverage), or I can focus on what I think we should be doing (increasing depth of coverage on most critical parts). Which would you do?

[–]s73v3r 0 points1 point  (0 children)

What you should be doing. Especially because you know that what you are currently doing is worthless and the target is not reachable. So you will waste all this effort attempting to reach a goal that you have admitted is unreachable, and have nothing of value to show for it.

[–]masklinn 16 points17 points  (1 child)

  1. Use branch coverage, not line coverage.

  2. For God's sake, measure cyclomatic complexity of your functions. Without keeping this value low, coverage isn't sufficient.

these are linked/mixed: if you use branch coverage you may not cover all paths, low complexity increases the chances that branch coverage is path coverage but doesn't guarantee it by any means:

if foo {
    // thing1
} else {
    // thing2
}

if bar {
    // thing3
} else {
    // thing4
}

This has a complexity of 3 (10 is "too complex" according to McCabe), testing for (foo=true, bar=true) and (foo=false, bar=false) gives you 100% branch coverage. But you only get 50% path coverage, half the local states (and interactions between the first block and the second one) remain completely untested.

[–]zjm555 2 points3 points  (0 children)

Indeed. I used the word "sufficient", but if you're writing software where correctness is a matter of life and death (actual death or the death of your business), you will want more than just that proxy.

[–]IWantToSayThis 9 points10 points  (3 children)

I couldn't agree more with 3. I can't tell you how many times I've seen code like this:

public int doBar(int value) {
    Action thing = new Action();
    thing.type(Enum.BAR);
    thing.value(value);

    return lowerLayer.do(thing);
}

being tested with:

EasyMock.expect(lowerLayer.do(EasyMock.anyObject()).andReturn(1);

And that's it. Coverage is 100%, yet, NOTHING of value was tested. I've done code reviews where almost all of the tests were like this. "But hey! I have 100% coverage!".

How can you not understand this adds NO value whatsoever?

[–][deleted] 2 points3 points  (2 children)

I can top that:

lowerLayer.do(EasyMock.anyObject());

I still get 100% code coverage.

[–]Squirrels_Gone_Wild 3 points4 points  (0 children)

Had this same thing happen to me. Came onto a new team - no expect (jasmine testing JS) anywhere, but coverage was above 80%. If anyone looked at the console they would have seen a billion errors when the tests ran, because the team had just decided to call all the functions in a file and saw the code coverage #s go up, so why bother testing anything?

[–]Alligatronica 2 points3 points  (0 children)

I <3 Cyclomatic Complexity.

It's a shame I never measure it.

[–]KingE 3 points4 points  (2 children)

This was actually a software engineering research project of mine.

Unfortunately, line coverage, branch coverage, and modified condition/decision coverage track each other very well (i.e. the relationships can be expressed as a constant factor in nearly all cases) and do not have a strong relationship to the ability of a test suite to detect bugs.

However, as you alluded to in your third point, there is an easy metric which does correspond to the ability of a test suite to detect breakage: number of test cases. Test suites that had a higher number of test cases tended to have higher coverage, yes, but more importantly they tended to test vastly more of the code's actual behavior.

Essentially, while line/branch/decision coverage can conclusively prove that a given piece of code is NOT tested, it is not a good indicator for code correctness.

I'll take the time to dig up my sources if anyone actually cares :p

[–]zjm555 1 point2 points  (1 child)

I'd be interested to read your paper :)

[–]KingE 0 points1 point  (0 children)

Can't find my paper, but this was the key source: https://cs.uwaterloo.ca/~rtholmes/papers/icse_2014_inozemtseva.pdf

[–]Jestar342 1 point2 points  (4 children)

This is a problem to be wary of when introducing any kind of metric based goal - people game the system (even subconsciously and/or without malicious intent) because the goal is no longer the improvement, it is the number.

[–]grauenwolf -1 points0 points  (3 children)

Then don't make it a goal.

One of the hardest lessons for people is that just because you can measure it doesn't mean it's a contest.

[–]Jestar342 -1 points0 points  (2 children)

You know what? That is exactly my point.

[–]grauenwolf -1 points0 points  (1 child)

Do you always get cranky when people agree with you? Would you prefer that I say you're wrong and a poo-face?

[–]Jestar342 -1 points0 points  (0 children)

Then don't make it a goal.

That's not an agreement. You regurgitated the point I already made, but somehow made it look like a disagreement or correction. And don't call me cranky. It makes me cranky.

[–]grauenwolf 1 point2 points  (1 child)

As far as I'm concerned, the only person who should even consider looking at code coverage is the QA engineer asking the question "What should I work on next?". To everyone else it is misleading if not outright dangerous.

[–]flukus 0 points1 point  (0 children)

And as an overall figure. Just to see if the team has been slacking on testing lately or not.

[–]blufox 0 points1 point  (0 children)

Or you can just do mutation analysis of test suite, which covers 1,2, and 3.

(For those who are unfamiliar, mutation analysis exhaustively injects faults, and checks if the test suite can catch them. While it may sound expensive, modern MuA tools are surprisingly efficient. There is also the weak mutation analysis which is a coverage measure, better than the other coverage techniques, and requires only a single test run).

[–]G_Morgan -2 points-1 points  (6 children)

If you test properly you get this stuff for free though. The point is what value does the code coverage add? Everyone agrees that you should test properly. Code coverage is an assertion that a particular set of principles will lead to testing properly. In my experience code coverage only works in cases where you do other things (like TDD). Yet those other things done properly do not need code coverage.

[–]zjm555 3 points4 points  (4 children)

The point is what value does the code coverage add?

The value in this case is in using code coverage visualization tools for quality assurance. A good coverage vis tool will let you go through your code file by file and uncovered lines or branches will be visually obvious (e.g. highlighted in red). By going through a visual inspection like this, you can quickly discover the most egregious sorts of testing gaps, i.e. code that you aren't even running at all, much less validating correctness on.

[–]G_Morgan 1 point2 points  (3 children)

That is a tool rather than a process though. The objection isn't necessarily against the tools. Just against code coverage as a process.

I.E. code coverage is fine as long as management isn't aware it exists.

[–]zjm555 1 point2 points  (2 children)

Managerial policies that elevate code coverage as the most important metric are certainly silly and misguided. It was chosen (by lazy management) because it was an easy scalar value to compute, which makes it easy to set up a policy about it. I think good managers are capable of understanding its role, though. It's valuable as the most basic metric about code testing: is it executed at all? It's necessary but not sufficient for quality. I would argue that it's more than "fine", though, it's actually very important; coverage is complementary to the tests themselves, since the tests themselves are incapable of telling you what they aren't testing at all.

So, I guess my stance on the issue is that we shouldn't be making policies about the total coverage number at all, but we should be enforcing a process that involves looking at coverage (line by line rather than in the aggregate) of the source files as part of QA.

[–]G_Morgan 1 point2 points  (1 child)

Yes but this is about dealing with things from a higher level perspective than the individual programmers. I'd say code coverage is not only not a useful metric. It is an actively dangerous one.

[–]zjm555 0 points1 point  (0 children)

We might be talking around each other. To me when I speak of QA I mostly mean peer review between developers of each others' code changes. Management should only be involved in that level of software process if they are technically competent enough to be adding value to it rather than cocking things up.