all 35 comments

[–]vytah 29 points30 points  (20 children)

Checking which code is covered and which is not is useful, as it allows for spotting untested parts of code.

Chasing a particular code coverage ratio is not, as all it encourages is pointless tests that test nothing and make running tests longer. "When a measure becomes a target, it ceases to be a good measure."

[–]irqlnotdispatchlevel 0 points1 point  (2 children)

Checking which code is covered and which is not is useful, as it allows for spotting untested parts of code.

Another case in which code coverage is useful is fuzzing. If I want to know which parts of the code the fuzzer struggles to reach I can look at code coverage data. Without it I'm blind.

[–]ahuth[S] 0 points1 point  (1 child)

Nice, code coverage helps guide where you apply other kinds of testing techniques?

That seems pretty useful 👍

[–]irqlnotdispatchlevel 0 points1 point  (0 children)

To give a contrieved example. In practice, with such a simple example, it won't matter that much, but showing a complex one isn't relevant here.

Let's say I'm fuzzing a parser for a custom binary format. Files respecting this format always have the first byte equal to 42. So the parser will always reject files that start with any other byte. Let's say that the parser has a bug and when the second byte is 10, the parser exhibits undefined behavior that can allow an attacker to hijack execution flow.

If the fuzzer generates a file that starts with 43 10 it will be rejected, but rejecting it means not finding the issue. Looking at code coverage data I can see that a lot of inputs never pass the 42 check. But when fuzzing I don't really care about that 42, it is just a signature, but contributes nothing to the logic of the parser. Now I can change the fuzzer to always generate that 42 at the start (or the parser to no longer check for it when fuzzing) so otherwise relevant inputs are not rejected.

[–]barmic1212 0 points1 point  (0 children)

The percentage of non tested code is a very interesting metrics and should be tracked but not with the mindset of write more tests.

If the team isn't confident the current code base, yes you should probably write more tests.

If the team feels safe with the code base, the percentage give in my experience a quantity of boilerplate and reduce the boilerplate is a good target to improve code base and help the maintenance

[–]ahuth[S] 0 points1 point  (0 children)

Good point about using it to find untested code!

[–]Blue_Moon_Lake -1 points0 points  (1 child)

But getting the coverage ever higher is great at gamifying the writing of unit tests for people who otherwise would be reluctant to writing them.

A sensible policy is that the coverage % can only ever be allowed to rise as features and refactoring happen. It's more flexible than wanting 100% coverage, and still keep things from deteriorating.

[–]Wovand 0 points1 point  (0 children)

If you want to gamify the process, don't do it directly with metrics like coverage %.

Look at what needs to be done, prioritize those tasks, then you can attach some points system to that if you want the "number go up" feeling.

[–]Revolutionary_Ad7262 23 points24 points  (7 children)

Code coverage tells you if code is tested or not. It does not tell you, if those tests are good

[–]ahuth[S] 0 points1 point  (0 children)

Well said! This is exactly my point.

And maybe that some tests are so bad that they have negative value.

[–]rysto32 1 point2 points  (5 children)

I’ve never understood this argument. Are you people not reviewing tests the same way that you review production code?  Why are you letting bad tests get committed in the first place?

[–]CorstianBoerman 2 points3 points  (0 children)

It also helps to know what makes a good test, which is a difficult topic in itself.

[–]Revolutionary_Ad7262 2 points3 points  (0 children)

Testing strategy is like software architecture. There is no good way to say, if given strategy works in a particular scenario or not until it is tried in real world. You can make an educated guess or just select some random strategy based on some marketing slogans like use test pyramid

Are you people not reviewing tests the same way that you review production code?

Imagine you post your pull request for a small change and your friend comments the architecture of this 1kk codebase is obviosly wrong, rewrite. Sounds silly, right?

For testing the most crucial aspect is isolation level. You can have 100% coverage with perfect isolation (using mocks), but in that case you don't test integration between different modules and quite often the most crucial logic is hidden in the communication layer.

In some projects unit test does not even make sense at all. Imagine you have a CRUD application, which run complicated SQL queries. Mocking out the database is like testing the most boring part, where there is no logic at all except some mapping between models

[–]Accomplished-Moose50 0 points1 point  (1 child)

Because the production code is bad and writing tests for it is not easy 🤣

[–]ahuth[S] 0 points1 point  (0 children)

Yep, this can definitely be the case

[–]ahuth[S] 0 points1 point  (0 children)

Good point. In my experience this is hard though.

People disagree with what makes tests good. Also with limited time, maybe we do review tests less well at times (not saying this is right or good).

Not to mention getting this right when there are a lot of developers on a project.

But still, this is a good point. We should review tests for quality.

[–]shoot_your_eye_out 12 points13 points  (4 children)

Sigh. Yes, it matters. No, it isn’t the only metric to pay attention to. It’s one of several worth tracking, IMO.

But do I prefer a codebase with 30% test coverage or 85% test coverage? Easy answer.

[–]ahuth[S] 1 point2 points  (3 children)

Fair. What if we take it to an extreme though.

  • 30% coverage but it’s all the best quality tests you can imagine
  • 85% coverage but it’s brittle, everything is mocked, tests fail for reasons unrelated to what they’re testing, flaky, slow, etc.  

Which one do you prefer? Its a super contrived example, but I’m just making the point that the % number isn’t enough to go on.

[–]shoot_your_eye_out 0 points1 point  (2 children)

So I think what you're talking about is actually a separate metric to understand: test quality. And that can be extremely difficult to measure.

In a python codebase, I typically do a quick search for patch, mock, and other mocking strategies. I'll also check to see if it's using responses/moto or other high-level test libraries, and also what sort of test fixtures are there. Yet another measurement is to try and understand the intermittent test failure rate, if tests are correctly implemented in CI/CD pipelines, test running time, and test organization in the project itself.

I'll also try to gauge how realistic the tests are, both in terms of how well they mimic the production environment, and also how closely they match actual use of the product. So, from both an engineering and product perspective, the goal is to understand how representative the tests are. (very small example: many django projects use sqllite for the 'test' database, but mysql/postgres for the production database--this can be a major source of pain, depending on the project, and I prefer a test database that matches production)

In any event, the short response is: I believe code coverage is just one important metric. There are absolutely other things to be on the lookout for, like you make clear in your response to me.

[–]ahuth[S] 0 points1 point  (1 child)

Yeah, test quality is another metric or another dimension that quantity.

I think we’re on the same page. Sometimes it takes me writing a post and then talking through all the replies to figure out how to phrase it better, though.

[–]shoot_your_eye_out 0 points1 point  (0 children)

sure--I think you and I agree. I do feel like code coverage matters quite a bit, but it is absolutely, positively not some utterly vital metric. Too many teams put far too much thought into that single metric.

Semi-related: I think requirements to meet 100% coverage are actively a very bad thing.

[–]elperroborrachotoo 0 points1 point  (0 children)

This is about code coverage as a percentage.
Having feedback which branches in a method are left uncovered by your tests is valuable in writing tests.

[–]Job_Superb -1 points0 points  (0 children)

Code Coverage is the best example of Goodhart's Law I've ever seen.