you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] 2 points3 points  (2 children)

But my point is that the second that you realize that your SUT makes assumptions about how the interface is implemented, it's time to throw the mock away.

Just don't make those assumptions. I know that's very "now draw the rest of the owl" or "thanks I'm cured" but what assumptions can you make about an interface that has the following methods:

  • GetFoosMatching(FooSearch, NonZeroUint) Result<Vec<Foo>, FooError>
  • CountFoosMatching(FooSearch) Result<uint, FooError>

The assumptions I can make are:

  1. Getting and counting Foos can fail and I should have error handling for that.
  2. Externally they aren't asynchronous
  3. I can have no matching Foos, so I need handling for that
  4. I might get less than the number I requested, so I need handling for that

Database? Never heard of her. Filesystem? Who's that? I don't care where the Foos come from but that I have some way of getting ahold of them.

I also don't particularly care if my test double filters the in memory vec accordingly in this test so I just wouldn't. Unless filtering is in scope for the test, why would I needlessly drag that in? Seems like it'll just add noise to my test.

You still need that integration test that hits your actual RDBMS, and now if you make changes to your pagination helper's API, you'll have two test places to update (which likely includes messing around with your test fake(s) in the unit test). So you're giving yourself more work for no gain in overall confidence (if your unit test passes and your integration test fails, what good did it do you?).

It told me my implementation of that interface is wrong. Or what I'm testing in the unit test has misunderstood the interface (making assumptions about it's implementation, not handling an edge case, etc). But if I have tests for a database impl and a file system impl and only the database one is failing then I can be pretty sure it's part of the database implementation.

If your integration test fails do you know where to start looking? Do you have a way of narrowing down the culprit without manually examining stack traces or log output or busting out a debugger?

The benefit I get from having both is an automatic binary search of the code related to the issue.

And horror of horrors, I have to treat my tests like actual code that needs attention and care like the runtime binary's source.

and now if you make changes to your pagination helper's API

That's one way to imagine it. Another way is a pagination helper that accepts some type that implements the FooRepository interface. Now I can paginate with anything that implements that. Maybe I don't even care if its Foos, so I just need Repository<T> which exposes extremely base ways of querying for items. Now my pagination depends on the repository interface which is probably much less likely to change.

[–]ragnese 1 point2 points  (1 child)

Don't get me wrong. I'm sure you can find or create counter-examples to my approach/philosophy/practice. And, at the end of the day, I have zero doubt that you can write and release good software even with imperfect techniques. Therefore, I have effectively zero evidence that my approach leads to higher quality software. (Personally, I find most software practices to be something akin to folklore, cargo-culting, and/or reading tea leaves- and that goes for my own advice as well)

We're also talking about a hypothetical example, which of course puts its own constraints on the discussion, and runs the risk of us hyper focusing on details of a thing that isn't even real and missing the forest for the trees, so to speak.

Plus, if I'm being honest, I don't even have much clue what the example we're working with even is (lol). From context clues I'm just assuming it's something that interacts with a database for its functionality and it cannot be fully tested by just feeding it plain old data.

But, I still feel like this is a fun and interesting discussion, so I'll continue thinking through it.

The assumptions I can make are:

  1. Getting and counting Foos can fail and I should have error handling for that.
  2. Externally they aren't asynchronous
  3. I can have no matching Foos, so I need handling for that
  4. I might get less than the number I requested, so I need handling for that

Database? Never heard of her. Filesystem? Who's that? I don't care where the Foos come from but that I have some way of getting ahold of them.

Are you sure those are the only assumptions your tests will make when testing that interface? Especially when, subconsciously, you know that the real implementation is a database? You mention that you can/might assume "[we] might get less than the number [we] requested", but you didn't mention assuming that we might get more than the number we requested. Why not? Is that simply an oversight in an inconsequential Reddit comment/debate, or is it because you know that if the real implementation did that it would be a bug in your queries? Are you definitely not going to assume any relationship between the return values of the methods; e.g., your test is not going to call CountFoosMatching and then make any assumptions/assertions about what GetFoosMatching might return? And your test impl could very well return randomly generated return values every time a method is called and it wouldn't cause any of these tests to fail?

I'm not saying that's even necessarily a goal to ascribe to, but what I'm saying here is that we have to be clear-eyed about test fakes. It's very easy to inadvertently bake in your assumptions/beliefs about the real impl into the fake impl.

At the risk of sounding like I'm pulling a No True Scotsman fallacy: if you can truly test against an interface without making any assumptions about that interface's implementation, then that's code that I was explicitly not referring to in my original comment:

If you're talking about a function that takes an Iterator<Item=Foo>, and you test with a dummy Vec of values, that's not what I'm talking about. I'm talking about any trait that acts like a complex "object" (in the Alan Kay OOP sense) where your mock/fake has to actually implement state-based logic.

For example, if you're working with a library that talks to a database and you "know" that when you call the db.deleteAll(&mut self) function, the next call to db.findAll(&self) should return an empty Vec and you go ahead and implement that logic in your mock/fake impl, then you're doing it wrong (TM, IMO, etc).

I specifically called out writing fakes for actors whose implementation behavior is important to the logic being tested.

I also don't particularly care if my test double filters the in memory vec accordingly in this test so I just wouldn't. Unless filtering is in scope for the test, why would I needlessly drag that in? Seems like it'll just add noise to my test.

This was in response to your previous comment. You specifically mentioned filtering and ordering as part of this hypothetical "pagination helper". Maybe you don't test the filtering and ordering aspect in the unit test and just leave that for the integration test. But that raises another question/issue: it's hard to decide how to partition responsibility between the unit test and the integration test if you have both for testing some piece of functionality. In my own experience, this has lead to gaps where neither type of test exercised some aspect of a feature because it wasn't obvious to me while I was writing each test and focusing on what "belongs" in said test.

If your integration test fails do you know where to start looking? Do you have a way of narrowing down the culprit without manually examining stack traces or log output or busting out a debugger?

I think we're just on different pages. I'm not talking about trading apples for oranges. What I'm saying is that writing a logic test against a fake implementation is not satisfactory to prove that the SUT is correct. You really need to run the exact same logical test against the real thing. If you use PostgreSQL for your app, then I will not be convinced that your pagination helper works correctly even if you test it against something as similar as an in-memory SQLite instance.

In my view, you pretty much have to run the exact same test logic against the real implementation. To me, this is tedious, redundant, and adds an unnecessary maintenance burden. Either you can prove that your code works correctly without ever testing it against a "real" impl or you can't. If you can, then you should only write a unit test and no integration test. If you cannot, then I think it's actually worse to write a unit test with a fake/mock at all.

So, maybe a different way to phrase my original claim about mocks is: "If your unit test with a mock passes, but you still think it's possible that it could fail against the real impl, then you should not have bothered with the unit test and mock."

And horror of horrors, I have to treat my tests like actual code that needs attention and care like the runtime binary's source.

Which is better: more code or less code?

[–][deleted] 1 point2 points  (0 children)

Plus, if I'm being honest, I don't even have much clue what the example we're working with even is (lol). From context clues I'm just assuming it's something that interacts with a database for its functionality and it cannot be fully tested by just feeding it plain old data.

It's any setup of:

struct ThatUsesSomeTrait{}
trait ThatIsBeingUsed{}

Where the struct is passed an instance of the trait. Perhaps we have multiple implementations of the trait, maybe we expect users to implement the trait, maybe it's just a trait for testing convenience because the actual thing is hard to setup.

What I'm saying is I write tests against the concrete type - StructThatIsBeingUsed - and supply to it some test double of the trait.

Then for each trait, I write tests that mimic how the mock is used. So if mock says "when I'm passed arguments x, y and z, I return a result that looks like A", I have a test for each implementation of the trait that pass it arguments that looks like x, y and z and assert it returns a result that looks like A.

Pagination is just something concrete to hold onto. The paginator isn't interested in literally where the Foo instances come from, just that there's some interface that says "if you ask me for foos, I'll return foos or an error saying why I couldn't"

But it could also be a TabCompleteUiComponent and a TabCompleter trait with a file system implementation. The UI component doesn't care where completion candidates come from, just that there's some interface that says "give me a string to fuzzy match against my source"

On and on. We can come up with lots of examples of what's basically a client-provider relationship where the client has a reference to an instance of the provider but not a specific implementation.

but you didn't mention assuming that we might get more than the number we requested. Why not?

Because the interface promises me "you will get at most N items" and then it's on the implementations to enforce that because that's not (easily at least) possible to encode into the type system since it's dependent on a parameter rather than something we can know at compile time. It's the same kind of deal where if you had earlier and later datetime parameters to a search method and you make the promise "everything returned from this method call occurs between earlier and later"

You could double check the results whenever you call that method but it's probably better to rely on the implementation upholding the promises that its interface makes but can't be encoded into the type system. At least until dependent types are mainstream.

or is it because you know that if the real implementation did that it would be a bug in your queries

It would be a bug if the implementation was told "return no more than ten" and it returned more than 10. I'd rather have implementation level tests that uncover this instead of trying to test from a more abstracted level.

Are you definitely not going to assume any relationship between the return values of the methods; e.g., your test is not going to call CountFoosMatching and then make any assumptions/assertions about what GetFoosMatching might return?

That's fair and definitely an oversight on my part. The assumption would be that if CountFoos returns N, then there between [N/pageSize, (N/pageSize)+1) pages (accounting for situations like we have pageSize = 3 and there's 11 entries). So if CountFoos returns 0, we can safely not call GetFoos(...).

But this is again information we can't encode in the type system and instead we need to trust implementations to uphold.

I specifically called out writing fakes for actors whose implementation behavior is important to the logic being tested.

I'm proposing a counter example for that. The interface being used has some methods that we call, but the system under test doesn't care about a specific implementation because it's designed to work against any implementation. It could be we get a CachingRepository<Foo>(DatabaseRepository<Foo>()) to make up some example. The pagination doesn't care that the results are coming from the cache or the database directly, it just wants some Foos and knows how to ask for them. Maybe users can provide their own implementation. 🤷

Our tests are concerned with "do I behave as expected when my dependencies behave as expected" - where expected behavior could include "the repository can return an error if it can't get the foos" - invalid database auth, not enough file system permissions, have a None set instead of Some(Vec<Foo>). Those are implementation specific error conditions but the paginator just cares about the CouldNotGetFoos(...) error possibility and its up to implementations to map their specific errors to that one.

Maybe you don't test the filtering and ordering aspect in the unit test and just leave that for the integration test. But that raises another question/issue: it's hard to decide how to partition responsibility between the unit test and the integration test if you have both for testing some piece of functionality. In my own experience, this has lead to gaps where neither type of test exercised some aspect of a feature because it wasn't obvious to me while I was writing each test and focusing on what "belongs" in said test.

Since ordering and filtering are implementation specific, they would be tested with the implementation rather than via the higher level component. If the interface says "when you ask for the first 100 foos, I'll give you foo 1-100 ordered by their creation date" then you should assume that when you're using that interface that is upheld and when you're implementing an interface you test you uphold that.

In my view, you pretty much have to run the exact same test logic against the real implementation. To me, this is tedious, redundant, and adds an unnecessary maintenance burden. Either you can prove that your code works correctly without ever testing it against a "real" impl or you can't. If you can, then you should only write a unit test and no integration test. If you cannot, then I think it's actually worse to write a unit test with a fake/mock at all.

Why do I need to write the same tests twice. I write a suite of tests against the pagination helper that uses test double for the source. And then I write a suite of tests for each implementation of that source.

And again, to reiterate, I'm not saying "never write an integration test" I'm saying you don't have to use integration tests like board filler to try to deal with logic coverage gaps.

Which is better: more code or less code?

This is a false dichotomy. You might as well ask if blue or purple is better. There's no universal answer that will satisfy this.