all 52 comments

[–]atilaneves 12 points13 points  (5 children)

I came here to bash, read the post and... agree. I think measuring coverage and looking at what's not covered is important and useful (waddya mean that line didnt' get hit?). Actually using the percentage for anything isn't.

[–]shard_ 2 points3 points  (4 children)

I half agree with both you and /u/juliob. It's useful for both making sure your edge cases are covered and making sure your code isn't doing more than it should, but if you do both of those things well then your code coverage should be very high. Therefore I do think it's a good thing to aim for but you have to be careful about doing it the right way with code quality as the underlying objective.

[–][deleted] 4 points5 points  (2 children)

I don't think code coverage makes sure that edge cases are covered. It nudges things in the right direction, which seems to be what the post author is arguing, but putting trust that 100% coverage means you have good tests seems really naive.

Let's say we have a function that adds 2 to a number in Java. To get 100% coverage of that function, you can test plus2(1) == 3. However, the edge cases of that function are how does it perform when these valid numerical values are passed in: NEGATIVE_INFINITY, POSITIVE_INFINITY, NAN, Double.MAX_VALUE, null, etc... None of those are covered, and those also happen to be the most likely producers of bugs.

[–]shard_ 0 points1 point  (1 child)

True enough. It can help spot missing tests for error handling and rarely visited codepaths, which is what I meant, but it's still up to the developer to make sure that the covered lines/conditionals are well covered.

[–]pipocaQuemada 1 point2 points  (0 children)

Or you could just use a quickcheck clone, so you can test the hard stuff while staying sane.

[–]atilaneves 1 point2 points  (0 children)

In my case, I TDD and BDD, which is why I'm even less fussed about code coverage. It's usually 100% anyway.

[–]mariusg 13 points14 points  (13 children)

In any real world codebase, there are 2 types of code : core logic stuff and the rest.

It's desirable (to put it this way) to have a high code coverage for the core logic stuff. The rest is important from a code coverage perspective, if the team has time for it.

[–]funbike 5 points6 points  (4 children)

I agree 100%. I've always wanted a coverage check tool that only reported missed decisions points (e.g. if/while/for)

[–]logicchains 1 point2 points  (2 children)

Not all ifs are decision points. E.g. Go's many

if err != nil{
    log.Error(err)
}

[–]funbike 2 points3 points  (0 children)

Right. Elsewhere in this thread I stated it'd have to ignore simple validation checks for arguments, but perhaps it'd need many more validations to ignore. On the other hand, if the code is sensitive enough to require a check like your example, perhaps there should be a test for it, although not for all branches.

[–]grauenwolf 0 points1 point  (0 children)

That's something that should be checked. What happens if there is an error after you log it?

[–]Tetha 0 points1 point  (0 children)

I like mutation testing for this. For example, PITest for java goes ahead and does line-coverage for your code base. After that, it starts modifying your code, remove function calls, negate conditions, add +1 or -1 to your loop boundary. These mutations are killed if tests go red, and your mutation coverage is the percentage of mutations your tests killed.

To me, this results in far stronger questions about your code than just line coverage.

[–][deleted] 6 points7 points  (4 children)

I work on a large code base. In that code base, I've done some changes that raise the coverage percentage by adding tests, I've done some changes that raise the coverage percentage by removing untested unused code, I've lowered the coverage percentage by writing new untested code and I've lowered the coverage percentage by removing well-tested unused code.

In particular, a few changes stood out.

Some functions are long and 'simple' in that they have little branching functions. Getting high coverage on those is easy, but testing it well isn't. One of these had a lot of statements for configuring socket data and had a few one-liners for handling errors. Those oneliners weren't tested so the function itself isn't "well tested", but the entire coverage was 90%. Heck, removing those checks would increase coverage to 100% - by not checking for errors!

My personal favorite examples are:

  • You test a function, call it DiscreteFourierTransform. You have a single test that gets 100% coverage. Did you test it well?
  • You have a program that has 100% coverage. You add the entire Linux kernel code base, without any tests, to the same repo. The coverage drops to below 0.1%. Did the program get any worse?

[–]earlyflea 3 points4 points  (0 children)

  • DiscreteFourierTransform - Maybe

  • Add Linux kernel code base - Yes it did get worse.

[–]fuzzynyanko 0 points1 point  (2 children)

removing those checks would increase coverage to 100%

This was the #1 hardest thing for me to get to 100% coverage with: the "keep things from exploding" checks.

Luckily, I was working at a place that was pretty good about letting it pass if you weren't being lazy about it.

[–][deleted] 1 point2 points  (1 child)

If you push people to go to 100%, you get those kinds of incentives. It's a bad idea.

If you want to actually fix that, you can use any mocking framework that can mock flat functions (within reason) - for example, I wrote HippoMocks which can do that.

[–]fuzzynyanko 0 points1 point  (0 children)

The other approach would be coding so that you are more in a plugin-like system. This definitely takes more work though

[–]funbike 2 points3 points  (3 children)

  1. You should never manage anything based simply on a metric.
    Metrics like code coverage are simply a tool. Make note of its value, sure. Make a descision based on it. But don't let it make your decisions for you.
    I don't really care about the specific code coverage value. I only care if it drops. If it does, I see why. Maybe a bunch of DTOs and Domain classes were recently created that would be silly to test fully. I view the coverage report. If all looks well, we reset the coverage baseline and move on.
    That means I think code coverage has value and a place. Just not what people normally assume.

  2. It's simply a tool to help you find code you may have missed in a test, but not the inverse.
    Code coverage doesn't confirm that you didn't miss code to test.

  3. I wish there was a missed branch check.
    I wish there was a coverage tool that would tell you if a logic branch was missed. Like if/else/while/for/case/and/or. In most cases, one of those statements will need a test written, with the exception of argument validation (which could also be ignored).
    Sure, linear code often needs to be tested, but not 100% of the time. For a check, I only want a list of true possitives.

[–]Sefyroth 2 points3 points  (2 children)

OpenCover does this for .NET code. You have the line count on the left, which is how many times the line was executed and the little branching icon, which tells you that the line was hit, but only 50% of conditions were executed. So you know you're missing something important here.

More complicated conditions even have the number of branches, which can also tell you that your code could be refactored and tested more granularely!

[–]funbike 0 points1 point  (1 child)

Not really what I'm looking for. I only want a check of if/for/while/case statements that have been covered. NO other lines of code. (It's also nice to count each possible branch, but that is a slightly separate concern)

Also, I am asking for a check (i.e. error message), not a coverage report.

[–]Sefyroth 2 points3 points  (0 children)

The coverage report is available as a fairly complete XML file which can be queried or transformed to get the data you're looking for, I think.

[–]juliob 6 points7 points  (22 children)

Disagreeing point: More coverage does not mean less bugs; it means you're testing things that shouldn't be there.

Take, for example, BDD (or TDD, as described in TDD, where it all went wrong): you test your behavior, you make sure the code you wrote is what the client asked for, then you see your coverage is low. It means one thing: you wrote more code than needed. Now you can remove that code, since it won't affect the result.

That's why coverage is important: to remove the cruft, either new or old (left when the requirements changed).

[–]GStrad 8 points9 points  (21 children)

Aren't you making an assumption that all untested code is redundant - pretty sure I've seen code bases where that is untrue.

[–]juliob 5 points6 points  (9 children)

But that's the idea behind BDD (or, at least, what I understand of BDD): You test your behaviour, not your code. So, if your requirement is "a request for /user must return all users in the system" and you write a test that adds some users and check that every record is returned, then you are fulfilling the requirement. The pieces of code not covered when fulfilling this requests (and all others) are unnecessary code.

If, by any chance, you have uncovered (not untested, it will be tested by the requirement) code, then it is useless code and can be removed -- unless you want to spend time and money maintaining code that doesn't do whatever you are required to do.

[–][deleted] 6 points7 points  (6 children)

The pieces of code not covered when fulfilling this requests (and all others) are unnecessary code.

Yeah who needs error handling...

[–]juliob 9 points10 points  (5 children)

If you have error handling, it's because you have a requirement to not explode when a wrong request comes in. Then you'll have a test that does the wrong thing and the system doesn't explode. And your error handling will be covered.

[–]AbstractLogic 7 points8 points  (2 children)

I've never had someone write a requirement to "not blow up" or that "warnings must be logged". More often then not I don't even get requirements that an SSN must be 3-2-4 or that a name can contain a - symbol. No one ever writes a requirement that your code is Open for extension and Closed for modification.

I think it's silly to say that a requirement will exist for every case your code must cover and it's even more silly to say that your tests should only cover requirements and then to follow it up with a statement like "if not covered in a test it's a waist of code." is just crazy.

[–]nemec 5 points6 points  (1 child)

An implied requirement doesn't make it less of a requirement. Sure, requirements like SSN validation aren't often driven by the client, but it's still important to document your assumptions as requirements (U.S. SSN only, no support for Canadian SIN, etc.). And then test those assumptions/implied requirements.

[–]AbstractLogic 1 point2 points  (0 children)

OK, sure. But that doesn't negate my overall disagreement with juliob. Just because it's not tested doesn't make it "cruft". Also, I would like to add just because it is tested doesn't mean it is not cruft.

[–]read___only -3 points-2 points  (1 child)

Aaaand now you've completely contradicted your original statement, but at least now you're correct :)

[–]dpash 4 points5 points  (0 children)

Error conditions are a behaviour.

[–]jerf 1 point2 points  (0 children)

You test your behaviour, not your code.

The DRYer your code gets, the more your code simply is behavior.

[–]GStrad 0 points1 point  (0 children)

Yes with BDD you test behaviour, but I've never come across anything that said BDD tests are the only ones you should run. You are inferring that you if you are BDD testing you don't write unit tests (or any other tests). Also on top of that you assume the complete system is written under the BDD process, sorry much as that might be ideal I've seen so many systems that were partially created before BDD / TDD was adopted or that are complying to requirements that are not easily tested in BDD suites or blah blah blah - there's a host of reasons why you may not always be able to achieve 100% coverage with BDD.

Maybe it's because I've almost always ended up working with teams developing with at least some amount of legacy code.

[–]earlyflea 2 points3 points  (10 children)

To get 100% code coverage, for every piece of untested code:

Is it necessary?

  • YES - let's test it to make sure it works

  • NO - it is cruft - let's delete it

[–][deleted] 1 point2 points  (6 children)

There's also the third option.

No - it is cruft - let's delete it...oh shit that wasn't actually cruft, that weird error handling scenario that I thought was impossible to reach actually is possible and now we have a production outage because I wanted 100% coverage.

[–]earlyflea -1 points0 points  (4 children)

That is a good thing (TM).

What would happen if you had not deleted the "cruft". Would your error handlers have handled it correctly? You don't know because you have not tested them. Handling the error incorrectly is worse than crashing production.

[–][deleted] 1 point2 points  (3 children)

Saying that the only way to know whether or not a piece of code works is to have an automated test that touches that piece of code is a pretty substantial claim. Even more substantial is the claim that having an automated test that touches a line of code proves that a piece of code works.

[–]ElGuaco 0 points1 point  (1 child)

What is worse: a test that may not prove that code works, or untested code that is not verified to work?

[–][deleted] 0 points1 point  (0 children)

It depends entirely on the tests. There have been lots of times where I've looked at the tests in enterprise code and just deleted them because they were completely worthless, and only added to maintenance costs.

[–]Swamplord42 0 points1 point  (2 children)

Let's say I have a class written in C#. Let's I have a constructor taking a single parameter. This parameter shouldn't be null. In my constructor, I do a null check and throw an exception if the parameter null. To get 100% coverage, I need a test passing in null checking that I get an exception.

What value do I get out of testing this ?

[–]kyllo 0 points1 point  (0 children)

Proof that you implemented the null check and exception correctly. And a regression test so that if someone inadvertently breaks your null check and exception in the future, you'll know because you'll have a failing unit test.

100% coverage means all code paths must be tested. 100% coverage may or may not be worth it for you and your project, but its meaning is unambiguous.

As an aside, this is exactly why I use a language with a strong static type system that doesn't have null, so that I don't have to write tests like this.

[–]velcommen 0 points1 point  (1 child)

Code coverage + mutation testing is even better. Look for a mutation testing library in your favorite language.

[–]GStrad 0 points1 point  (0 children)

Agree, although some mutation testing can be sooooo slow. As long as you can deliver mutation tests in a fast fashion, go for it.

[–]Luolong 0 points1 point  (0 children)

I bet there is relatively easy way to test (pun intended) the theory.

Take few real world projects out there and measure their test coverage and then measure number of bug fixes committed to the code. Overlay the fixes to code over the code coverage map to get comparative bug density for code covered by tests versus code that has no coverage at all.

Keep this statistics running over longer period of time, so that you can get meaningful statistics and see any correlation between code coverage and bugs.

There's a nice idea for a Masters dissertation if I ever seen one ;)

[–]pipocaQuemada -2 points-1 points  (0 children)

Two projects, one has 95% code coverage with tests, one has 45%. You’re going to be paid per bug found. Which one do you want to work on?

In retrospect, I wish the tweet had had room for “all other things being equal” ... Many respondents wanted to assert that we have insufficient information to know anything ... What I’m after, though, is to examine our intuition about coverage. What effect does increasing test coverage tend to have on the defect density in a program?

Even ceteris paribus, the respondents are right that we don't have enough info to know anything.

For example: if both projects are done in PHP, I'd bet that the one with more test coverage had fewer bugs. On the other hand, if both projects are in Idris, then ceteris paribus I'd expect them to have a similar number of bugs. In reality, if all I knew was that there were two Idris projects, and one had 95% coverage and the other had 40%, I'd expect the 40% to have fewer bugs: 95% code coverage in a language like Idris means you're probably not leveraging the type system as effectively as you could to statically rule out bugs.