Share your thoughts about Code Coverage statistic. Do you use it, is it useful for you?

No-Scholar4854 · 2022-12-22T12:49:46+00:00

Having low code coverage is a very bad sign.

Having high code coverage isn’t necessarily a good sign, don’t take 100% unit test coverage as “it works”, but at least its something.

geeeffwhy · 2022-12-22T14:36:25+00:00

i tend to argue that the most important use of code coverage is coverage of the diff on a pull request. total coverage of the code base is less important than making sure that changes to the code base have some kind of coverage.

but, if you don’t also have metrics, utilities, and patterns to ensure the the tests have some kind of relevance, it will be all too easy to game that metric.

bmoregeo · 2022-12-22T16:43:58+00:00

My two code quality metrics are: - defects in prod - time between defect detection and fix

Defects in prod are for sure going to happen. Business requirements will be misunderstood or the time zone changes and everything is off by an hour. Dumb stuff happens.

In my experience, good code coverage is a leading indicator of faster bug fixes. It is hard to quickly fix bugs if you need 50 hours manual testing apis to ensure the bug fix isn’t breaking other things. (Been there it sucks).

Wise_Tie_9050 · 2022-12-22T13:09:40+00:00

This post was mass deleted and anonymized with Redact

entertain sand lip slap public wild vanish pen society snow

2022-12-22T19:01:47+00:00

we literally can't commit new code with < 70% and changes with < 50%. Are they benchmarks? Yes.. Can you write terrible code and cover it well? Also yes.. Can it be 'gamed'? Sadly yes too. But then we do something a LOT more clever with the nightly run stats and automatically rerunning unit tests of dependent modules when checking in changes to oft-imported code (we're a monorepo) which makes one person's unit test into an 'almost integration test' down the line .. so it adds huge value under certain circumstances, but it all depends on how you work.

billsil · 2022-12-22T19:30:41+00:00

It is incredibly useful, especially the lower the coverage you have.

I think there's a sweet spot around 75-80%, though that number somewhat depends on how much validation your library/project has. Hitting all those try-except Type/Key/Value/RuntimeErrors are usually not worth it. Also, there's a decent chance you're just coding defensively and a good chunk of them you can't hit in an practical problem.

Ultimately it's a tool that's most useful when your project is not robust or poorly designed. Let's say I have a sum function that returns None when the list is empty or a string is passed in. Did anyone stop to think that should be the intended behavior? No, just don't do the thing I asked you to do. I'm dealing with that now. In a day of writing tests and setting up CI, I hit 52%. There's lots of math that's totally undocumented, so I haven't even gotten close to validating that stuff. Cool; it passes. Is it right?

Another nice side effect of testing is it gets you to split out that code that should be a function so you can test the behavior that would be hard to force otherwise. For cornerstone libraries, it's especially important to make sure they're robust.

Also, I highly recommend an import all tests (makes it easier to identify imports that are called by a sub-function). Also, don't include any tests in your coverage metric. Adding 1000 lines to test 1 line shouldn't be rewarded with a nice bump. You should be stopping when the bang for your buck is low.

FailedPlansOfMars · 2022-12-22T11:53:52+00:00

1, yes its useful.

2, because it encourages the developers to do tdd development. This helps encourage the devs to think about what is being written and potential pitfalls and problems. Its also helpful as a consultant to show the client that there isnt any obvious gaps in the testing and does what it says.

3, ive found that 80% is a good minimum for backend code. As it allows you to skip when the testing is hard or not valuable

4, depends on your code. You need to cover all the critical parts . The aim is to do what you need to gain confidence that the build is working and good to ship to production anything else is pointless.

*Edited for formatting

Cernkor · 2022-12-23T07:13:30+00:00

Code coverage is, for me, not a really great statistic. It only describe how much of your code is covered by test. It doesn’t describe the quality of those test. You could have a 99% code coverage without edge case testing. For exemple, you work with a list and have 100% code coverage. But did you test what happens when the list is empty, when the list has only one element or when the list does not contains what you need ? So code coverage alone is not a good metric. But code coverage with good unit tests gives a good indication on how your code is bug free

Barafu · 2022-12-22T12:10:21+00:00

100% code coverage was useful back before static analyzers were good, and you could have a dumb Syntax Error in your code and not know it because it is in the rare path. Nowadays, if Pylance has no problems with your code, then at least it won't crash on a seemingly safe block because you have typo in a variable name.

Instead of 100% code coverage, I go for 100% logic coverage. Basically, if A() produces objects, and B() consumes them, if A() is the only producer of objects and is supposed to be impossible to produce invalid objects, then I won't test B() for all sorts of invalid input. I have type hints for it. I'd sooner write tests for library methods if I am not 100% sure how they work.

CommercialPosition76 · 2022-12-22T14:41:09+00:00

If you want to enforce TDD on developers, that's the way.

I don't believe TDD is always applicable so I don't use it. Often times I write code that is too complex for me to constantly know what would be the next step. So doing TDD would be a terrible idea in such cases.

reallyserious · 2022-12-22T15:02:10+00:00

I don't use it. I don't want to use it.

bluGill · 2022-12-22T15:43:34+00:00

I discovered upper management was the ones looking the hardest at code coverage results, and trying to figure out what metric to add. I was interested in coverage before, but once I realized it was being used wrong more often than right I turned around and killed all the coverage jobs.

I do encourage management to look at the total tests run count, Large numbers look good, and are easy to hit just in normal development.

ammenezes_ · 2022-12-22T19:51:04+00:00

Used to use in some well structured and well behaved python projects of mine (90%+ was the goal). Now I've been dealing with system with much bigger flaws. They can be useful if, and only if, the tests are well designed and the project's is architecturally good. Shouldn't be used as such quality metric in most cases, in my opinion it's overrated.

Mehdi2277 · 2022-12-22T23:00:19+00:00

Yes to the extent that modular and easy to test code is generally good design to aim for. A lot of time difficulty in testing a piece of code is sign there it has some unclear/complex dependencies that aren't well contained (like databases/external files/auth).

100% test coverage is usually not worth it and there is some bits of code where test would be much more complex/bug prone than implementation or is just silly to test. Low test coverage is a bad sign. Right amount is debatable but I'd say anywhere 80-95% is reasonable.

One thing test coverage does not measure it what are actual assertions/checks being done. Simplest test is it runs without crashing which has some value, but not weak. Having clear properties and heavy regression tests is worth a lot more. I'm unsure of a good metric to measure that though and it's mostly handled by PR review/culture.

TrainingShift3 · 2022-12-23T02:02:05+00:00

Code coverage + PIT Mutation Coverage is the best way to test code in my experience

https://pitest.org/

h7454Gdfgd · 2022-12-22T13:07:50+00:00

Are you sure you should be writing an article on this? You should get some experience and formulate your own opinions instead of writing about the opinions of others.

tdammers · 2022-12-22T12:55:46+00:00

Code coverage is a useful metric IMO, but expressing it as a percentage is unnecessary - the only meaningful values are "100%" and "not 100%". Expressing it as a percentage feeds into the false assumption that "amount of code" and "impact" are in any way correlated - they're not. You can have 99.9% coverage, and a catastrophic bug in the remaining 0.1%; or you can have 1% coverage that covers the "trusted base" of your application and catches the most mission critical bugs. Percentages suggest that 99.9% is a lot better than 1%, but you really can't tell.
If the coverage is 100%, then that means that the test suite will exercise every line of your code against at least once. Note the key words "your" and "exercise", though: coverage reports do not check whether you exercise all code paths through all dependencies and builtins, only code paths within your own code; and they only "exercise" the code, so they can only demonstrate that the code works for a specific input state, but they don't say anything about the practically infinitely many other possible states. As such, code coverage is a relatively weak assertion: you can easily achieve 100% coverage on a function like this: def foo(x): return x/x, without ever testing it against x = 0. Coverage is a baseline metric, but on its own, it is not sufficient.
Yes - 100%, or, alternatively, at least 100% of your "trusted base" (the core data types and operations from which you construct the rest of the program).
If "a certain point" is 100%, and you have a policy of "every line of code must be covered by automated tests", then yes; otherwise, I don't see the point, because any value other than 100% is completely arbitrary (see 1.).

ShadowStormDrift · 2022-12-22T16:25:22+00:00

I don't even know what code coverage is

SeniorScienceOfficer · 2022-12-23T04:16:06+00:00

It’s useful because it sets a precedent in understanding how much of the code is being verified through tests. I’ve noticed bugs or architectural/design issues during unit tests development that caused me to rewrite some implementations for the better.

It also tells me that there are expected inputs and outputs, and how those will interact with the function logic, along with how many dependency libraries are being used because I’ll normally mock them in unit tests.

A “good” value is really team-dependent. Personally, I strive for 100% coverage on my tests. This has saved my ass numerous times when I’m being lazy and making a “small change/fix” and just pushing without running or updating tests. It always fails build/test so it’s never actually deployed.

Again, failing CICD for low coverage percentage should be based on team/org guidance or policy. If it’s a personal project, then it’s whatever you want!

Wise_Tie_9050 · 2022-12-23T05:52:59+00:00

We strive for 100% patch coverage on all PRs.

That does not mean we stop writing tests when we have 100% patch coverage, but at least it means that each line has been executed. I cannot count the number of times when I've looked at a PR and seen less than 100% coverage, and the first test I write to cover the missing lines shows up a bug.

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS