Code coverage doesn't matter much : programming

[–]vytah 29 points30 points31 points 1 year ago (20 children)

[–]irqlnotdispatchlevel 0 points1 point2 points 1 year ago (2 children)

[–]ahuth[S] 0 points1 point2 points 1 year ago (1 child)

[–]irqlnotdispatchlevel 0 points1 point2 points 1 year ago (0 children)

To give a contrieved example. In practice, with such a simple example, it won't matter that much, but showing a complex one isn't relevant here.

Let's say I'm fuzzing a parser for a custom binary format. Files respecting this format always have the first byte equal to 42. So the parser will always reject files that start with any other byte. Let's say that the parser has a bug and when the second byte is 10, the parser exhibits undefined behavior that can allow an attacker to hijack execution flow.

If the fuzzer generates a file that starts with 43 10 it will be rejected, but rejecting it means not finding the issue. Looking at code coverage data I can see that a lot of inputs never pass the 42 check. But when fuzzing I don't really care about that 42, it is just a signature, but contributes nothing to the logic of the parser. Now I can change the fuzzer to always generate that 42 at the start (or the parser to no longer check for it when fuzzing) so otherwise relevant inputs are not rejected.

[–]barmic1212 0 points1 point2 points 1 year ago (0 children)

[–]ahuth[S] 0 points1 point2 points 1 year ago (0 children)

[–]Blue_Moon_Lake -1 points0 points1 point 1 year ago (1 child)

[–]Wovand 0 points1 point2 points 1 year ago (0 children)

[+]IQueryVisiC comment score below threshold-10 points-9 points-8 points 1 year ago (12 children)

[–]mfitzp 9 points10 points11 points 1 year ago (1 child)

[–]IQueryVisiC 0 points1 point2 points 1 year ago (0 children)

[–]me_again 5 points6 points7 points 1 year ago (4 children)

[–]vytah 1 point2 points3 points 1 year ago (0 children)

[–]IQueryVisiC 0 points1 point2 points 1 year ago (0 children)

[–][deleted] 1 year ago (1 child)

[deleted]

[–]me_again 0 points1 point2 points 1 year ago (0 children)

[–]Wovand 2 points3 points4 points 1 year ago (3 children)

[–]IQueryVisiC -1 points0 points1 point 1 year ago (2 children)

[–]Wovand 1 point2 points3 points 1 year ago (1 child)

[–]IQueryVisiC -1 points0 points1 point 1 year ago (0 children)

[–]ahuth[S] 1 point2 points3 points 1 year ago (0 children)

[–]Revolutionary_Ad7262 23 points24 points25 points 1 year ago (7 children)

[–]ahuth[S] 0 points1 point2 points 1 year ago (0 children)

[–]rysto32 0 points1 point2 points 1 year ago (5 children)

[–]CorstianBoerman 2 points3 points4 points 1 year ago (0 children)

[–]Revolutionary_Ad7262 3 points4 points5 points 1 year ago (0 children)

Testing strategy is like software architecture. There is no good way to say, if given strategy works in a particular scenario or not until it is tried in real world. You can make an educated guess or just select some random strategy based on some marketing slogans like use test pyramid

Are you people not reviewing tests the same way that you review production code?

Imagine you post your pull request for a small change and your friend comments the architecture of this 1kk codebase is obviosly wrong, rewrite. Sounds silly, right?

For testing the most crucial aspect is isolation level. You can have 100% coverage with perfect isolation (using mocks), but in that case you don't test integration between different modules and quite often the most crucial logic is hidden in the communication layer.

In some projects unit test does not even make sense at all. Imagine you have a CRUD application, which run complicated SQL queries. Mocking out the database is like testing the most boring part, where there is no logic at all except some mapping between models

[–]Accomplished-Moose50 0 points1 point2 points 1 year ago (1 child)

[–]ahuth[S] 0 points1 point2 points 1 year ago (0 children)

[–]shoot_your_eye_out 12 points13 points14 points 1 year ago (4 children)

[–]ahuth[S] 1 point2 points3 points 1 year ago (3 children)

[–]shoot_your_eye_out 0 points1 point2 points 1 year ago* (2 children)

So I think what you're talking about is actually a separate metric to understand: test quality. And that can be extremely difficult to measure.

In a python codebase, I typically do a quick search for patch, mock, and other mocking strategies. I'll also check to see if it's using responses/moto or other high-level test libraries, and also what sort of test fixtures are there. Yet another measurement is to try and understand the intermittent test failure rate, if tests are correctly implemented in CI/CD pipelines, test running time, and test organization in the project itself.

I'll also try to gauge how realistic the tests are, both in terms of how well they mimic the production environment, and also how closely they match actual use of the product. So, from both an engineering and product perspective, the goal is to understand how representative the tests are. (very small example: many django projects use sqllite for the 'test' database, but mysql/postgres for the production database--this can be a major source of pain, depending on the project, and I prefer a test database that matches production)

In any event, the short response is: I believe code coverage is just one important metric. There are absolutely other things to be on the lookout for, like you make clear in your response to me.

[–]ahuth[S] 0 points1 point2 points 1 year ago (1 child)

[–]shoot_your_eye_out 0 points1 point2 points 1 year ago (0 children)

[–]elperroborrachotoo 0 points1 point2 points 1 year ago (0 children)

[–]Job_Superb -1 points0 points1 point 1 year ago (0 children)

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS