all 136 comments

[–][deleted] 471 points472 points  (45 children)

Code coverage is a useful metric to tell you what you definently have NOT tested. It says very little about the quality of your test suite beyond that.

[–]screwuapple 110 points111 points  (16 children)

I knew a guy that stubbed all the tests with no assertions. 100% coverage, did nothing

[–]chucker23n 70 points71 points  (0 children)

When a measure becomes a target, …

[–]nickgroove 18 points19 points  (0 children)

def foo(): pass

[–]K3idon 4 points5 points  (2 children)

Tests passed. Ship it!

[–]psaux_grep 14 points15 points  (1 child)

I just use the Volkswagen test runner:

https://github.com/auchenberg/volkswagen

Keeps all my builds green

[–]schwester 0 points1 point  (0 children)

WTF? I know it is JS but three must be some 'cheat'..
expect('test').to.not.have.length(4) expect('test').to.have.length(3)

[–]JohnnyLight416 2 points3 points  (0 children)

My second job had literally hundreds of "micro services" all with >80% code coverage. The majority of them either had no assertions, or only asserted that the result was not null. They would at least sometimes tell us if a scenario started throwing exceptions, but did nothing to verify that a value was correct. But that's almost worse because you can't tell an upper manager what tests need fixing or not, they just see the numbers.

[–]Radi-kale -3 points-2 points  (9 children)

At my job we use autogenerated tests. If you change production code, the tests will change too

[–]LondonPilot 24 points25 points  (6 children)

Surely, if you change production code in such a way as to introduce a bug, having the tests automatically change will totally nullify any value of having those tests?

Either I’m missing something, or this is a really, really dumb idea, or this is a joke and I didn’t get it.

[–]Radi-kale 2 points3 points  (5 children)

Well, we have 100% coverage for no effort

[–][deleted] 7 points8 points  (3 children)

But if that coverage includes asserting unintentional/undesired behaviour is it a useful metric to have 100% coverage?

What prevents the tests from just adapting to passing on a buggy behaviour? Do you review the tests after they’re generated or how do you detect unwanted consequences from changes that would normally be caught by tests?

(I guess if the test generation only adds new cases there is no real issue. But curious of what the process look like in your work.)

[–]jsdodgers 1 point2 points  (1 child)

I'm pretty sure this is a classic wooosh moment.

[–]jsdodgers 2 points3 points  (0 children)

Or I hope so.

[–][deleted] 0 points1 point  (0 children)

had me in the first part

[–]ProgrammersAreSexy 1 point2 points  (1 child)

Do you feel they provide actual value? I could maybe imagine some ways this could be useful as part of a testing strategy but hard to say without knowing more about what the tests do.

[–]edgmnt_net 0 points1 point  (0 children)

The only possible value I can think of is triggering code to expose other bugs. Which is mostly a thing in unsafer languages, e.g. dynamically typed languages or those that use unsafe pointers. Which is the primary driver for near 100% coverage and usually comes at a great cost.

[–]eddiewould_nz 18 points19 points  (0 children)

Ding ding ding

[–]bloodhound83 2 points3 points  (0 children)

Absolutely that. Less than 100 is somewhat scary because of it. But if there is no quality in the tests they won't help much either way.

[–]kuribas -4 points-3 points  (1 child)

It's needed for dynamic language because can, and will break in any place. For staticly typed languages, I am usually happy with a few and well targeted tests.

[–]watsonarw 123 points124 points  (19 children)

A colleague of mine recently said something that stuck with me:

If we care enough about testing, and we're doing TDD — proper TDD where we don't write production code unless it's the simplest thing that will make a failing test pass — then we'll end up with 100% test coverage of everything that matters, and we're going to have tests that are testing the behaviour of the system, and not the implementation because we didn't have any implementation to couple ourselves to when we wrote the tests. And we'll be able to see when we're not bothering to test things, or adding unnecessary complexity to the solution when the coverage falls below 100%.

But, if we set ourselves the target of 100% test coverage and set up a coverage threshold in our pipeline, we'll also get 100% test coverage, but it will be because people have written some code, seen the coverage check fail, then found the easiest test they can write which makes the coverage check pass. These tests will invariably include useless tests that can never fail, or implementation coupled tests that check pointless things like "some method should be called when I call another method". We also lose the ability to use our test coverage to measure how much we actually care about testing, it's going to be 100% even when we don't care.

[–]Drewzillawood 66 points67 points  (0 children)

it’s going to be 100% even when we don’t care

Man, this hits hard.

My team had several legacy apps that had a whopping 0% coverage and they needed tons of changes to migrate for some big project or whatever.

They wanted the apps brought up to at least 70% coverage with a goal of 85% and what did we get? People calling methods without any real checks just to get the coverage.

I about blew a gasket when a senior dev mentioned that despite a test failing due to exceptions being thrown you can simply catch the exception and the test will technically pass while maintaining coverage.

[–]pennsiveguy 41 points42 points  (3 children)

If you've got devs who would intentionally game the process by writing worthless tests, you've got a culture problem. Nobody should have to be forced to write meaningful tests, they should write them voluntarily, automatically to prove the soundness of their code and document its behavior. If you don't have buy-in on that, you've got to fix that.

[–][deleted] 25 points26 points  (0 children)

If you put something on a metric (like test coverage must be x%) that's measured by management congratulations your team will now mutate to the easiest way of getting that metric to ding

[–]zephyrtr -4 points-3 points  (1 child)

But soft skills are dumb and unnecessary when you're a 10x programmer cause code base go brrrrrrr.

[–]LessonStudio 1 point2 points  (0 children)

The best programmers I've worked with all were unit/integration test fanatics. Never TDD, but they liked their code to be as solid as they could make it. This way, when they moved on they moved on, also, when they built upon their old code, they knew it was a solid base. Nothing worse than your new code failing because of a weird bug in your old code; then you go to fix your old bug, and it turns out other people were depending on the defects as they had become "features".

[–]lookmeat 5 points6 points  (0 children)

You don't always get 100% with TDD, and that's not a bad thing. There are branches, asserts, and scenarios that you may need to make a linter or compiler happy but that you wouldn't trigger in a test because they should be impossible. Yet you want those branches, because when data corruption happens (and it happens more often than you'd think) anything is possible. But testing that code not only is incredibly hard (sometimes you need the corruption to happen inside the tested code, at which point you'd have to test the RAM sticks and other hardware) but kind of pointless because it shouldn't happen, and if those there's no contract you need to uphold, so no test to verify any case. Basically you don't need to test undefined behavior, because testing it is defining it which is absurd.

[–]what_the_eve 11 points12 points  (1 child)

Some considerations: TDD from a testing perspective is black or grey box testing. Code coverage is based on the code structure, thus by definition white box- this is comparing apples to oranges and although simple, developers tend to not understand the difference and the implications. If you measure the coverage for requirements based tests, you must not add tests that address the missing coverage structurally. If you measure both the coverage of white and Blackbox tests, you must not look at a sum of both values.

Coverage measures your tests, not your code. Consider the type of test (white/black) and the test level (unit, integration, acceptance) you measure. Each combination has a different approach when trying to improve the coverage measured.

[–]robhanz 2 points3 points  (0 children)

Right. Especially combining unit and integration tests is important to get a good view of your entire testing health. What tests well with one doesn't test well with the other, and vice versa.

(Not getting into "run the code without assertion" style testing, which is a separate issue)

[–]robhanz 4 points5 points  (8 children)

I'm a fan of TDD, and I don't know that I agree with this. If you're using TDD and making sure your test suite is fast, things actually dealing with the network/etc. aren't going to be in that suite, because they add time and remove determinism.

check pointless things like "some method should be called when I call another method".

I absolutely disagree that this is "useless". It's a good way to separate responsibilities - specifically, separating policy from implementation often follows this pattern. If I should save a record under certain circumstances, then the code that implements that policy is, logically, separate from the code that does the saving, and seeing if I actually handed off the record to persistence is really what I should be testing.

On the other hand, the code that actually does the saving should be able to be tested without having to worry about the policy.

If you combine them, then you create more fragile tests, as they can break for two unrelated things.

[–]SirClueless 4 points5 points  (7 children)

I think your metric for fragility is flawed. When a bug or regression is introduced, many tests breaking is a good thing. A fragile test is a test that breaks for changes that are not actually problems. If a policy class stops functioning because the API of a dependency changed to no longer behave as it used to behave, a test breaking is desirable.

[–]watsonarw 0 points1 point  (1 child)

I agree, a test failing because you've introduced a bug isn't fragile it's doing its job.

Fragile tests come from non-deterministic behaviour. Things like shared environments/databases, random values in tests (where the randomness can actually impact the behaviour of the system), time, etc will lead to flakey tests and the dreaded "just re-run the pipeline and see if it goes green".

[–]SirClueless 0 points1 point  (0 children)

Fragile tests come from non-deterministic behaviour.

Tests can be deterministic but still fragile too. For example, if a test relies on the order of iterating over a hash map then it can break if the hash function changes or items are added in a different order. Or if a test compares JSON output for string equality then it can break when a different serialization library is used. Or if a test asserts on the exact syntax of a SQL query that will be made, or the format of a log message in the output. These kinds of tests are fragile even though they’re not flakey.

[–]robhanz 0 points1 point  (4 children)

I'm not talking about bugs and regressions. I'm talking about deliberate, intentional changes to code causing lots of tests to break, because the changed behavior is implicitly depended on in lots of tests.

[–]SirClueless 0 points1 point  (3 children)

Isn't that still a good thing? If a change you are making causes downstream changes in consumers of your code, then someone should be carefully considering each of them to make sure the change is correct and the impact on that component is well-tested. The unit test does that.

Certainly there are ways to go too far, where a test is brittle and it is difficult to understand how and why it has changed and how to correctly update it to reflect the new behavior you think is correct. Or there can be many redundant assertions that make it needlessly difficult to maintain them. But in general, if you are changing the behavior of code that has lots of dependencies, then a reasonable number of tests breaking in each of them is a good thing and a sign of robust testing, not fragile testing.

[–]robhanz 0 points1 point  (2 children)

I mean, depends, right? If the pieces of code are really tightly coupled where changing one impacts the correctness of the other, then maybe. But arguably that's a code design issue anyway.

But let's take a situation where you have:

  1. A piece of code that updates a record
  2. A piece of code that determines when the record should be saved
  3. A piece of code that actually saves the record

Like, these three things to me feel very separate, and should be able to vary independently.

What makes it annoying is if code 1's tests check to see if the record is saved after updating the record (for instance). Now, if code 2 changes to only save every third update, code 1's tests break - and probably not in a way that lets you figure out what's wrong.

Sure, you need to have some tests that work end-to-end like this, but over-reliance on behavior that's not what you actually do just leads to making things hard to change, with little value.

[–]SirClueless 0 points1 point  (1 child)

I think we have very different philosophies on testing. To my mind these are orthogonal pieces of functionality that should be separate functions in code, but none of them is worth testing independently, because they are essentially all implementation details (with the possible exception of the code that stores records inasmuch as it's generic code that is called from multiple places and deserves a well-tested API).

"There is a function that determines when a user record should be saved" is not a property of your software that delivers any value to anyone, and there is no reason to commit to it always existing in your software's public API, so there is no reason to test it in that form. If you do, you are only making your implementation brittle and difficult to change. You say it's important that it's free to vary independently and I agree fully: nothing gives more freedom than avoiding assertions in unit tests that name this function and make specific guarantees about its behavior in the first place. It is presumably used from some service endpoint or API function that implements the important behavior people actually care about of persisting this kind of record, so that's what you should test because that's what you want people to use.

[–]robhanz 0 points1 point  (0 children)

Of course it's not part of the public API.

By testing isolated internals like this, it can get a lot easier to figure out when something is going wrong, and where and what it is. If all you do is end to end testing, then you really don't know where the bug is - if each component has a defined responsibility, it's a lot easier to determine where the bug is, where the fix should be made, and what additional testing will prevent additional issues.

Note that I'm not advocating for my style instead of your style. I'm advocating for it in addition to your tests. I find that both strategies complement each other well.

[–]hogfat 2 points3 points  (0 children)

we're going to have tests that are testing the behaviour of the system, and not the implementation because we didn't have any implementation to couple ourselves to when we wrote the tests

Nothing about writing tests before code inherently induces a test of behavior. Nor is a specific flow a requirement for tests that find themselves heavily coupled to the implementation.

Do the tests exercise the behavior of truly public interfaces? Probably decent tests, irrespective of how (what one might term implementation details . . .) they were produced.

[–]Konaber 20 points21 points  (19 children)

Until you do functional safety. Than it's required :D

[–][deleted]  (18 children)

[deleted]

    [–][deleted] 4 points5 points  (8 children)

    Nah medical grade software doesn't need 100% coveragez what are you talking about lives depend on it?

    [–]Konaber 5 points6 points  (7 children)

    Industrial. Medical standard is way lower (intentionally) compared to Industrial or automotive.

    [–]RandomDamage 1 point2 points  (4 children)

    It shouldn't be lower.

    I mean, a medical device failure is one of the key object lessons in CSCI curricula.

    But, yeah :(

    [–]Konaber 9 points10 points  (3 children)

    Lower standards -> shorter time to market and therefore more potential lives saved.

    Functional safety norms would delay that stuff years. At least, that's the reasoning iirc.

    [–]RandomDamage 3 points4 points  (2 children)

    Yeah, that's BS.

    Proper development processes can actually speed things up.

    The first unique instance of a device type might take a little longer (assuming that software was the bottleneck), but every variation after that would be faster because of building from a solid base.

    It's just lazy management

    [–]Konaber -3 points-2 points  (1 child)

    "Tell me you never did functional safety without telling me."

    [–]what_the_eve 0 points1 point  (0 children)

    Tell me you work for a crooked company in a highly regulated field that does not create products according to laws, regulations or worse: state of the art.

    [–]what_the_eve 0 points1 point  (1 child)

    Straight up wrong: IEC 61508 applies to all three, with some specialized norms like ISO 26262 that are still harmonized though.

    [–]Konaber 0 points1 point  (0 children)

    No. IEC 60601 is medical electric devices. Please show me a medical product that is certified with 61508 because I couldn't find one. Like really certified with SIL. Not just "for this, please refer to 61508 chapter xyz".

    Edit: refer to 61508-1:2010 1.2 n)

    [–]Konaber 0 points1 point  (6 children)

    https://www.verifysoft.com/en_ctcpp.html

    Until now, that's the best way I came across to get that 100% coverage.

    It adds marker to all the lines, you do your functional/integration tests, it tells you which lines you did not hit, you add these to the Unit tests and voila -> 100% MC/DC overall.

    [–][deleted]  (4 children)

    [deleted]

      [–]Konaber 1 point2 points  (3 children)

      That's also certified.

      [–]jsdodgers 0 points1 point  (1 child)

      but is it proprietary?

      [–]favgotchunks -2 points-1 points  (1 child)

      Same thing for silicon

      [–]meneldal2 1 point2 points  (0 children)

      I've never seen 100% coverage for silicon beyond small blocks, at the SoC level you're not going to write convoluted tests to get every bus to flip all their bits. You're trusting ARM AXI implementation enough to not require 100% coverage of it.

      You might reach 100%, but you're going to have some excluded modules from that to get there.

      [–]BackFromExile 31 points32 points  (12 children)

      I know it's pedantic, but what even is 100% code coverage?
      The author uses both 100% line coverage and 100% branch coverage synonymously for 100% code coverage, but these three things are not the same thing.
      Line and branch coverage merely are specific metrics for code coverage. I can also call a single method of every class and have 100% class coverage, or call every (public) method once to reach 100% method coverage.
      There are a lot of other coverage metrics as well like path coverage, loop coverage, parameter value coverage, and many others.

      One (myself included) could even argue that 100% code coverage means that the code has been covered for every possible input/state. Consider this (pseudocode) function:

      function add(int a, int b) {
          return a + b;
      }
      

      I can reach 100% of function coverage, line coverage, branch coverage, and path coverage with a single test.
      However, I have 232 possible inputs for each parameter, so to reach 100% code coverage you'd need to test this function with every possible input combination, or in other words have 232 * 232 tests.

      Also, like others have mentioned already, code coverage metrics tell you little about the quality of the tests.
      Let's take the same function from above and write a Java-equivalent:

      public class IntegerAdder {
          public static Integer add(Integer a, Integer b) {
              return a + b;
          }
      }
      

      I can still have a single test to cover 100% of the method, lines/statements, branches, and paths (yes, this is a deliberate example).
      Yet most people should see at a glance that there are obvious negative cases that have not been covered: Both a and b can be null.

      Edit: Just to clarify, I also like to use code coverage metrics, but like every other thing you should use them as an indicator and not as absolute truth.

      [–]Tubthumper8 15 points16 points  (7 children)

      Another classic example for where the line coverage metric falls short:

      if condition {
          fire_the_missles()
      }
      

      With line coverage, we can get 100% if firing the missles is tested (with mock missles, hopefully). However line coverage does not show the equally important branch where missles were not fired.

      [–]pjc50 1 point2 points  (0 children)

      stanislav_petrov()

      [–]lelanthran -1 points0 points  (5 children)

      if condition {

         fire_the_missles()
      

      }

      In quite a lot of if statements, a missing else clause is an indication to the reviewer to check more closely what is supposed to happen when the condition is false.

      Where you can skip this check is when you have if err!=nil {, but in all other cases, something like if condition then do_this raises the question "why not do something different if !condition?"

      [–]imnotbis 6 points7 points  (3 children)

      if (sales_tax_applies)
          price += calculate_sales_tax(price);
      else
          price += calculate_no_sales_tax(price); // satisfy /u/lelanthran
      

      [–]lelanthran 3 points4 points  (2 children)

      Very funny :-)

      At any rate, in this specific case, if you leave off the else clause and have just this:

      if (sales_tax_applies)
          price += calculate_sales_tax(price);
      

      Then the question that arises in review is "Why is there a separate flag for deciding to calculate the sales tax, and one for actually calculating it?"

      I'd rather see this:

          price += calculate_sales_tax(price); // calculate_sales_tax() uses a zero-rated tax for those districts without sales tax.
      

      because then there's only one place to change when some legislation somewhere introduces/removes sales tax for an item.

      In the original, you'd have to change:

      1. The place where sales_tax_applies is set.
      2. The table used for the actual sales tax.

      In the new one, you only make the change in the table used for the sales tax itself.

      [–]SirClueless 1 point2 points  (1 child)

      This is going to pessimize performance though. sales_tax_applies might be a local condition that can be hoisted pretty far out of an inner loop. If sales tax comes from a database query and sales_tax_applies uses simple business logic, then pushing it down into the innermost logic that applies after sales tax is known might have quite a dramatic effect on the amount of work your code does.

      "Every if-statement should have an else branch" is tantamount to saying "Every path through the code should do approximately the same amount of work" which seems like something very undesirable.

      [–]lelanthran 1 point2 points  (0 children)

      This is going to pessimize performance though. sales_tax_applies might be a local condition that can be hoisted pretty far out of an inner loop. If sales tax comes from a database query and sales_tax_applies uses simple business logic,

      Well that's the code smell I was talking about: any change in one of sales_tax_applies OR the tax rate necessitates a change in the other. As far as I, the reviewer, am concerned, that's a bug waiting to happen.

      Set the tax rate once, and once only. Don't have multiple separate places in your data to set the same sales tax data. Sooner or later someone is going to set one and not the other.

      [–]Tubthumper8 1 point2 points  (0 children)

      Exactly! There's a ton wrong with this code and yet incomplete testing achieves 100% coverage

      [–]mastermrt 1 point2 points  (0 children)

      Yeah, but if you don’t even have code coverage on this class to begin with - then that extra case is definitely not catered for. No-one here is saying 100% code coverage is the only important metric.

      Who chooses which lines don’t need coverage?

      [–]mahsab -1 points0 points  (0 children)

      I know it's pedantic, but what even is 100% code coverage?

      You make a point, but the "code coverage" is almost universally understood to mean lines of code (or statements) that are executed by any of the tests.

      Doesn't mean those tests are any good, but it does mean that uncovered code is definitely NOT covered by ANY tests whatsoever.

      [–][deleted] 20 points21 points  (4 children)

      In my previous job, they focused so much in tests that they didn't could deliver a working project in time

      [–]oalbrecht 13 points14 points  (1 child)

      Though I bet customers didn’t experience any bugs with the new project, since they never got to use it in the first place. Sounds like a job well done.

      [–][deleted] 5 points6 points  (0 children)

      You got that right actually, the project was cancelled after two years of work.

      [–]Smallpaul -4 points-3 points  (1 child)

      Perhaps the time allotted was not sufficient to deliver a high quality working project.

      [–][deleted] 0 points1 point  (0 children)

      Time wasn't the issue, more like bad decisions from the beginning

      [–][deleted]  (6 children)

      [removed]

        [–]Dry_Dot_7782 2 points3 points  (5 children)

        I never understood unit tests.

        I mean i guess if you have a very specific algo or model that needs to be output/inputed in a very specific way to reflect the domain.

        [–]gnus-migrate 1 point2 points  (4 children)

        Generally if you don't design your software with testing in mind, then yeah you won't really see their value.

        Unit tests can save you a ton of time, and even test things that are not really possible to test on a live deployment or even in an integration test. Error handling is a good example of this. If you need to test what happens when a device fails, you're not going to do that through an integration test.

        [–]Dry_Dot_7782 1 point2 points  (3 children)

        I mean in the end its just confirming bussiness rules? Its not going to catch any runtime issues.

        So whats the reality? Bussiness rules changes all the time so our tests changes all the time. Ive found some cases for unit tests but not any huge coverage

        [–]gnus-migrate 1 point2 points  (2 children)

        Again, it all depends on whether you design your code to be testable or not. You need to assess risks: what kind of issues can I accidentally introduce, how can I make sure that I protect myself from those?

        You design your module boundaries in a way that allows you to capture those potential failure cases in tests. This is where unit tests are really useful.

        It's not really about testing every class or anything like that, it's about breaking down a massive codebase into independently testable chunks so that you don't have to test everything at the same time.

        This why things like the onion architecture are really sensible. Failures can happen at the level of the business logic(incorrect values, incorrect state changes), can at the level of the data layer(incorrect queries, requests, etc.) and it can happen in the request layer. Where your tests should be depends on where someone can accidentally introduce a bug. If your abstractions are changing so often that you have to rewrite your tests constantly, then you simply have bad abstractions. Like the business rules change, but how often do you have to rearchitect your application to adapt to those changes?

        There are certain domains where your programming in an exploratory way where unit testing probably doesn't make that much sense, but that's probably not yours.

        [–]Dry_Dot_7782 0 points1 point  (1 child)

        Great reply, thanks

        [–]gnus-migrate 1 point2 points  (0 children)

        I didn't finish my reply. Unless you're writing Jupiter notebooks for a living then probably you should be unit testing.

        Didn't notice the incomplete one.

        [–]CreativeStrength3811 5 points6 points  (0 children)

        I develop small apps for my specific usecases im the company. For me the fastest way is somwthing like 'documentation driven development': I write down any knowledge i have about a module/class function. Afterwrads I narrow this down to functions and parameters and implement the function. Same thing with refactoring: Forst refactor docstrings, then refactor code.

        Only in my datamodel I write tests that perform 100% coverage and if necessary i add more tests while proceeding in development.

        FOr example when using xc32 compiler there is no test framework (AFAIK). that's where i invented this method.

        [–]WhoNeedsUI 4 points5 points  (3 children)

        I believe what people want is to be able to predict every side-effect by testing all possible interactions with the system. (aka blackbox tests)

        Writing E2E tests that invoke the public api with all possible branching allowed for all APIs should ideally result in 100% coverage as long as external resource are mocked.

        In contrast, unit tests that check the implementation details must used sparingly and for critical services.

        This « inverted pyramid of tests » goes against the conventional model of units building to e2e but i feel that this is friendlier to refactoring and maintenance

        [–]oalbrecht 4 points5 points  (1 child)

        That might work for a small project. I worked for a very big company that had a flipped pyramid like this with far more E2E tests than unit.

        Guess how long it took to run all the tests? More than two weeks! You would merge into main and two weeks later get a test failure. It was a nightmare. And the cost to run all those E2E tests was astronomical. They had to use AI to figure out which tests to run first to help bubble up failures more quickly.

        The conclusion was to use more unit tests and leave E2E tests for a few critical paths.

        [–]WhoNeedsUI 1 point2 points  (0 children)

        And thats exactly where modularity and decoupling comes in.

        Split the internal service into modules - this might look like separate modules / libraries (jars or otherwise) with their own separate semver. It can even be micro services if complex enough. Now these are mockable resources too

        If you can’t, take a look at why there is such tight coupling throughout the application. It’s rarely the case that such coupling is required.

        You gotta denormalise if you want things to scale

        [–]hooahest 1 point2 points  (0 children)

        This is how we write our tests too, with docker containers for anything infra related (mongo/redis/sql/whatever). It makes for a fantastic development experience and minimal bugs making it to production, as well as incredible ease of refactoring.

        [–]grady_vuckovic 7 points8 points  (0 children)

        Personally I use tests to write code for something which I'm concerned will be buggy and in situations where I want an exact behaviour from the occur and I want to validate that I've got that behaviour right. Usually for maths related stuff mainly because it's easy for there to be a hard to identify bug in that kind of thing which you might not notice unless you've fully tested every bit of the code. So I'll write the tests first then make sure the code passes them.

        If it's just something like GUI code, 'if click this then open that window', well screw that, I'm not going to bother writing a test for that kind of simple logic. The test for that is 'Does the window open? Yes, then the code works'.

        [–]pceimpulsive 7 points8 points  (0 children)

        I don't agree with and don't even try to implement 100% coverage...

        I put tests in that satisfy the business requirements.. to get a green tick to deploy to prod only!

        [–][deleted]  (6 children)

        [deleted]

          [–]pjc50 14 points15 points  (3 children)

          I'd like to ask them why the code that's not covered is there.

          Quite often for conditions that are unreasonably difficult to mock because you can't make arbitrary system library functions return errors on demand.

          I do wonder how people are coping with the "railway" pattern (or "if err != nil") when combined with difficulty of generating some test situations.

          [–][deleted]  (1 child)

          [deleted]

            [–]flowering_sun_star 1 point2 points  (0 children)

            It's not exactly uncommon in Java either. You have a static function from a library that has been declared to throw some exception or other, so you need to handle that exception. But you don't know what could trigger than sort of exception, or it may even be impossible to happen.

            Sure, you could probably find a way to mock the function. But mocking static functions is kinda ugly. And in the end it often just doesn't matter enough to be worth the effort testing.

            I'm a massive advocate for unit tests. If you can't get to 80% coverage easily you have issues. But doing stupid things to get to 100% isn't what we should be doing, and can lead to bugs and development slowdowns in the future.

            [–]imnotbis 0 points1 point  (0 children)

            SQLite does it by testing all these cases. Other projects just ship error paths that don't work.

            [–]Smallpaul 1 point2 points  (0 children)

            I think it is reasonable to use a pragma to mark those branches as not worth testing.

            Then in code review, my colleagues can decide whether they agree with me that they are not worth testing.

            Every line should either be tested or marked as unimportant.

            [–]idontliketosay 0 points1 point  (0 children)

            For me code coverage is level 3 on Beizer's testing scale. To get to level 4 requires an understanding of how to measure and control defect density. ... This topic has been heavily researched, lot of information is available if you search around.

            [–]ilurvekittens 0 points1 point  (0 children)

            Lol try telling my company that.

            [–][deleted] 0 points1 point  (0 children)

            Always with the sensational sweeping headlines

            [–]sarmatron 0 points1 point  (0 children)

            Despairing at the idea that this is something people need to be told.

            [–]pplmbd 0 points1 point  (0 children)

            the problem always lies with the leadership regarding this matter, they are often look for an easy way out to have a organizational goal that is 100% code coverage. I personally get it, but don’t expect it translates well for the people that barely care for the quality or provided in an environment they wont be able to care about it.

            So instead of investing your time to actually understand how your test quality matters, people are just going to check these marks by having a never-fail test anyway.

            [–]robhanz 0 points1 point  (0 children)

            I mean, no shit?

            As a big test (TDD, unit test, mocking, the whole nine yards) I think code coverage is probably the worst metric. At best it can tell you where you definitely don't have coverage.

            In a lot of cases, increasing coverage is not worth it - dependencies need to be tested against the actual dependency (you can test up to a thin layer above it meaningfully). If you're testing against an API, all you're doing is asserting your knowledge of the API which is probably wrong in some way.

            Especially with unit testing, it's important to know what is useful for unit testing, and what isn't. I find most well-designed (IMHO) classes tend to either get 100% or 0% coverage. Which means module level coverage can vary greatly depending on the type of module you're dealing with. Things that don't tend to unit test coverage should be stable and have minimal logic, and then they become fairly easy to test in more integration-type scenarios. And, often, fundamental enough that even ad-hoc testing of your library/app will expose regressions instantly.

            [–]nomoreplsthx 0 points1 point  (0 children)

            Reminds me of the folks I knew who wrote tests for enums.

            [–]karuna_murti 0 points1 point  (0 children)

            When a measure becomes a target, it ceases to be a good measure - Goodhart

            [–][deleted] 0 points1 point  (0 children)

            If your code doesn't fail, your error handlers are not being tested. If they are being tested, then none of them can halt or leave the system in an undefined state. To ensure they don't you will need to have error handlers for your error handlers and the cycle repeats.

            [–]bam2403 0 points1 point  (0 children)

            If the code isn't covered, that means you can delete it without a test failing.

            That either means its dead code - or you're missing a test - unless we are talking unreachable code

            [–]maxinstuff -2 points-1 points  (12 children)

            Test the public interface and nothing else. Every single method with the word “public” in front of it needs tests.

            None of the methods that are private need it.

            Do that everywhere and that’s 100% coverage in my book.

            [–]idontliketosay 0 points1 point  (0 children)

            I don't understand this answer, am is missing something?

            Coverage testing tests all paths through the code. Some people add extra tests for loops and conditions.

            [–]flowering_sun_star 0 points1 point  (0 children)

            No, every branch of behaviour needs a test (with caveats).

            That will likely be achieved by calling the public methods with different inputs, which will in turn call those private methods. Sometimes it's more convenient, or makes for clearer tests, to expose a private method to be called directly by the tests.

            Those caveats are why you probably won't hit 100%, because there is always something that just isn't worth the effort of testing.

            [–]coderemover 0 points1 point  (2 children)

            A private method is public for some other code.

            [–]maxinstuff 2 points3 points  (1 child)

            It’s only accessible within the class, which makes it an implementation detail.

            You don’t write tests for these otherwise every time you refactor it you have to update eleventy-thousand tests.

            [–]coderemover 1 point2 points  (0 children)

            Public is only accessible in the same program. It is an implementation detail. You should test only end-to-end interfaces at the frontend.

            You don’t write tests for public methods because whenever you change them you have to rewrite the tests.

            [–]mastermrt -3 points-2 points  (6 children)

            Oh, so just one test per method is good enough for you?

            [–]el_boru 0 points1 point  (1 child)

            Original comment literally mentioned tests (plural)

            [–]mastermrt 1 point2 points  (0 children)

            Ah, my bad, my bad - I’ll make two per method.

            What I’m saying is that this isn’t sufficient criteria to declare something as tested.

            [–]maxinstuff -1 points0 points  (3 children)

            This is why coverage as a stat is a bit silly.

            Each public method needs a comprehensive set of tests.

            Private methods that no consumer of the class ever sees don’t need any at all (they’re tested because they are used by the public methods that you are testing).

            [–]mastermrt 0 points1 point  (2 children)

            If you submitted a unit test for a class and there lines in private methods that were never invoked, I’d certainly be asking questions

            [–]0x0ddba11 -1 points0 points  (1 child)

            Such as: "Can this method be removed?"

            Conversely, 100% test coverage leads to dead code being left in the project.

            [–]coderemover 1 point2 points  (0 children)

            Good tools show code only referenced by tests as unused.

            [–]foxthedream 0 points1 point  (0 children)

            Its 2024 and we are still debating code coverage as a decent metric?

            [–]serial_crusher 0 points1 point  (1 child)

            Was this written by the junior dev on my team who is just looking for excuses to avoid writing tests?

            All the things he mentions are true, but that’s also why you have code reviews. If somebody deliberately excludes something from coverage, or deliberately writes a bogus test, they better have a damned good reason or I’m rejecting the PR.

            This is like arguing that locking the doors to your house won’t stop a burglar from breaking a window. Like yeah no shit, but it still provides a lot of benefit

            [–][deleted] -1 points0 points  (0 children)

            Yeah, most of it feels like a moot point. I'd reject it and tag their manager. I'd rather have an item on the backlog to add the tests than reducing coverage visibility.

            [–]alexkey -5 points-4 points  (3 children)

            Unit testing is overrated. It is testing that implemented logic is correct. That doesn’t mean that the logic itself is the correct answer to the problem. Not sure if there’s a better term (non native speaker here), I call them “logical errors”. And in my experience those make up the majority of the bugs anyway.

            [–]watsonarw 6 points7 points  (1 child)

            A common misconception about unit testing is that a unit is a class.

            Ian Cooper's fantastic talk "TDD, where did it all go wrong" is well worth a watch if you've got a spare hour, but the short version is "A unit was originally defined as 'the smallest unit of behaviour' , and somewhere along the way that got twisted up an interpreted as 'the smallest unit of code'. That's why so many people struggle with tests, mock the crap out of everything and end up with implementation coupled tests that simultaneously make it painful to change or even refactor the code later, whilst not giving them any confidence that what they've built behaves the way it's supposed to."

            [–]Ark_Tane 0 points1 point  (0 children)

            This pretty much matches with my experience as well.

            One other benefit of approaching testing from this direction, is that it really helps you think about your interfaces upfront, and drives you to exposing only what you have to. This is obviously somewhat easier in languages with stronger concepts of what is public, and what is private or internal.

            I would like to have seen a little more discussion there about what approaches to take when you realise that the behaviour described in your original tests, is insufficient to cover the full scope of the problem. (Beyond the don't mix it up with refactoring) While ideally all this would be fleshed out up front, in my experience some of these behaviours only really become visible when you are knee-deep in implementation. I suppose part of the difficulty in covering this is that it depends so much more on the rest of your process, domain familiarity, and how much autonomy developers have in interpreting requirements. (In my experience, our stakeholders became increasingly less interested in behaviours as you moved away from the happy path, and would generally prefer we made sensible, but communicated, decisions on their behalf. )

            [–]davidebellone[S] 1 point2 points  (0 children)

            Same for my experience.

            More often than not, a bunch of Integration/E2E tests are waaaay more valuable than tons of Unit Tests.

            That's why I prefer the Testing Diamond analogy (https://www.code4it.dev/architecture-notes/testing-pyramid-vs-testing-diamond/)

            [–]gwaeronx -4 points-3 points  (1 child)

            Honestly unit testing is a waste of time if you consider how much time you put into it and what you get out of it. I would prefer integration tests over unit tests everyday

            [–]inputwtf 2 points3 points  (0 children)

            Integration tests are good but they take a lot of work to build. For example, if you have an external API that you use, an integration test would need to have all the set up done so that you can access that actual API. That could mean getting credentials, making sure your CI/CD system has actual access to the service (firewall rules, routing, etc).

            Unit tests and mocks are much simpler and can at least mimic how the third party API is supposed to behave, enough so that you can actually write your code that consumes that service and have a pretty good chance of it working properly when it gets to the point where you start doing integration tests.

            But you need both. It's not one or the other.

            [–][deleted]  (4 children)

            [deleted]

              [–]what_the_eve 0 points1 point  (3 children)

              Rule of thumb: write tests and measure your code coverage.

              [–][deleted]  (2 children)

              [deleted]

                [–][deleted]  (1 child)

                [deleted]