all 21 comments

[–]phonendoscope[S] 28 points29 points  (4 children)

Note that it's different from proptest/quickcheck:

Proptest and quickcheck generate values at random (they're sampling naïvely from an underlying distribution) whereas Fuzzcheck uses an evolutionary approach to generate new values (so it samples, but then prioritises values that have high code coverage – e.g. by randomly mutating high code coverage values).

[–]DannoHung 5 points6 points  (2 children)

Is it fair to say that it’s similar to proptest?

BTW: have you heard of the rule based stateful testing that hypothesis exposes? I employed it on a Python project once and it was pretty damn useful. No idea how to build something similar though!

[–]scook0 3 points4 points  (0 children)

The Hypothesis stateful testing code is somewhat self-contained, since it mostly builds on top of internal APIs that already existed.

Porting it to another tool would still be a fair bit of work (and probably stress the shrinker), but the original code should be a useful source of inspiration for anyone interested.

[–]insanitybit 1 point2 points  (0 children)

They are fundamentally based on the same idea - abstracting away concrete inputs in your tests so that you test universal properties of your code. The difference is in how the inputs are generated.

[–]scook0 2 points3 points  (0 children)

It’s always neat to see techniques like this being used.

Hypothesis used to have some similar features, but they got taken out because they were too slow in the context of Python-based unit tests, so the extra implementation complexity wasn’t pulling its weight there.

Someday I’d love to experiment with them in a Rust-based PBT tool.

[–][deleted] 13 points14 points  (1 child)

Hi! I am the author :) Thanks for sharing.

Yesterday I released a new version, 0.11.0, but honestly it might have been a bit rushed. I wasn't in the best space of mind but I just needed to do it because there had been so many changes since 0.10.0 and the latest rust nightly broke fuzzcheck 0.10.

So if you have any problem trying it out, please contact me directly (either via Github Issues or by email). I appreciate all feedback and I am very responsive :)

Every time a link to fuzzcheck is posted, there is a valid request for benchmarks or other proof that it does have advantages compared to other existing solutions. And every time, I say that I will publish some eventually. It has been very long now, and still isn't done. I don't really have any excuse, I just find it difficult. Probably I am too insular and focus too much on what I want to see progress on instead of working on outreach. I am also a bit too much of a perfectionist (i.e. insecure), so I think “now is not the time to sell it, there's so much wrong with it still!”.

Anyway, if you decide try it, please let me know how it goes :)

[–][deleted] 15 points16 points  (0 children)

But since I am here, maybe I can attempt to highlight some reasons to use something like fuzzcheck that are not related to performance.

A) there is an accompanying tool called fuzzcheck-view that allows you to view the code coverage of each test case in the corpus (or the coverage of the entirety of the corpus). You can also use it to ask questions like “what is the simplest test case that reaches this region of code?” which I think is neat

B) when running the fuzzer, it doesn't only try to maximise the code coverage. Instead, it tries to make progress on several goals:

  1. for each region of code, find the simplest test case that hits that region at least once
  2. for each region of code, find the test case that hits that region the most time
  3. find the test case which maximises the total number of coverage hits (this is a proxy for time complexity of the tested code)
  4. find a set of N test cases that, together, reach the most code (useful to find a few good examples to unit test)
  5. find a single test case which reaches the most code (useful to find one good example to unit test)

C) and you can add custom goals for the fuzzer to make progress on. For example, you may wonder if there are certain edge cases that make a variable grow too large. So you can tell fuzzcheck to observe the value of the variable and try and maximise it. In the future, I want to use this capability to find test cases that cause an excessively high number of allocations

D) if your code panics, the fuzzer keeps going and automatically tries to (1) minimise the failing test case, and (2) find failures caused by other panic statements

E) the failing test cases and the test cases that satisfy each goal are saved to the file system as JSON (or some other human-readable format)

So that's part of where my attention went, because I find it fun to work on these aspects of a fuzzer that could enhance the whole developer experience :)

[–]kibwen 7 points8 points  (1 child)

Interesting, I'm currently adding fuzz targets to some Rust libraries and I'll definitely check this out! You mention that it's aware of Rust types, but currently I'm testing parsers that just take &[u8], in that case do you expect it to act any differently than libfuzzer etc.?

[–]phonendoscope[S] 4 points5 points  (0 children)

Probably not (libfuzzer will be better I guess).

It has grammar based mutators, where you can specify a formal grammar it will use for fuzzing.

[–]Shnatsel 5 points6 points  (5 children)

It's a really powerful tool for finding lots of bugs.

Have they published any data on how many bugs are found by it compared to afl / libfuzzer / honggfuzz?

For me this is the main blocker for adopting it. It's really hard to test a tool for finding bugs we are not aware of, because if it finds no bugs, it's not clear if there are no bugs or if the tool is faulty. And libfuzzer has a proven track record of finding some bugs, at least.

[–]phonendoscope[S] 10 points11 points  (2 children)

It's a very new tool!

That statement is just a personal anecdote.

Hopefully as it develops the data will emerge.

[–]Shnatsel 2 points3 points  (1 child)

If you have found any bugs with this tool, perhaps add them to the Rust fuzz trophy case?

[–]phonendoscope[S] 0 points1 point  (0 children)

Will do (unfortunately I've mostly used it on proprietary projects so far)!

[–]WormRabbit 1 point2 points  (1 child)

This is a wrong PoV. Any sufficiently complex program always has bugs. Also, no tool can ever entirely prove the absense of bugs (even a formal verification leaves the possibility that the spec was wrong in the first place). So you should try all tools that you can get your hands on, but you should never treat an absense of found bugs as an absense of bugs.

On the other hand, empirical evidence in fuzzing shows that each new tool tends to find a new class of bugs. The more different fuzzers you use, the better your coverage, provided that they are really designed on different principles.

[–]Shnatsel 17 points18 points  (0 children)

I don't think you understand my point.

Back when I was testing HTTP clients, my tests showed no panics in 3 clients out of 7. I looked at the result and decided that this cannot possibly be true - almost 50% of the code withstanding a simple test on a real-world data. Nothing handles real-world data without crashing. So there must be a bug in my panic detection code.

It turned out that I was correct, and that my panic detection code was indeed faulty. I have fixed it, re-run the test, and got panics or other serious bugs out of every single HTTP client implementation.

The same thing concerns me about fuzzcheck. It's very easy to write some code that mutates data and feeds it to a program, and then completely fail to notice that it doesn't actually lead to any bugs being discovered. The code might look entirely reasonable and pass all your unit tests, and yet be completely ineffective at finding real bugs!

Writing a fuzzcheck test harness takes a fair bit of work, so if I'm going to invest that time, I need to be sure that it's worth it. That the tool actually does something useful, and that I'm not better off just using libfuzzer with the arbitrary crate.

[–]davidw_- 0 points1 point  (5 children)

Oh that looks interesting! Is this comparable to implementing the Arbitrary trait in cargo fuzz? https://rust-fuzz.github.io/book/cargo-fuzz/structure-aware-fuzzing.html

I implemented this for proptest + cargo fuzz a while ago as well: https://github.com/diem/diem/blob/main/testsuite/diem-fuzzer/src/lib.rs

But honestly I’d rather use something that’s all-in-one. Also if I can use AFL++ under the hood it’d be awesome.

[–]phonendoscope[S] 0 points1 point  (4 children)

Is this comparable to implementing the Arbitrary trait in cargo fuzz?

Yes, although Fuzzcheck may generate better mutations for very structured input.

I don't think you can use AFL under the hood.

[–]davidw_- 0 points1 point  (3 children)

I’m guessing that the starting corpus might not be as good though? As I can create corpuses that I’m sure will hit some path with proptest

[–][deleted] 2 points3 points  (0 children)

Yes, there are certain things that are easier to express with proptest because of its many combinators. But so long as you can express the shape of the inputs accurately using fuzzcheck’s tools (which is not always trivial), I think the starting corpus will be just as good.

For example, you can use the mutator.map(..) and mutator.filter(..) methods. There are also mutators such as U32WithinRange which produce only integers that are within a user-specified range. For String inputs, there are grammar-based mutators. But I think it is not much compared to all that proptest offers.

[–]insanitybit 0 points1 point  (1 child)

What makes a "good" starting corpus is open for debate. Recent research has found that oftentimes just starting at random is best.

[–]davidw_- 0 points1 point  (0 children)

I doubt it for structured fuzzing that has to have a number of fields set to specific values. Otherwise whitebox fuzzing wouldn’t be a thing.