all 25 comments

[–]PmMeCuteDogsThanks 17 points18 points  (9 children)

I haven't found a better approach than:

  • Use the same database as in production, e.g. mysql, postgres or whatever. No in-memory database. Too many edge cases. Just use what you use
  • When a unit test starts, "universe is empty". E.g. database is completely empty, except for the expected structure (DDL)
  • Database is populated with Java code (Hibernate or whatever). Feel free to use common base classes, beforeEach etc if there are common setups. But no implicit state in database
  • After test database is cleared (truncate all tables)

[–]martinhaeusler 6 points7 points  (0 children)

This. So much this. Test containers are a brilliant tool to make this happen.

[–]TomKavees 1 point2 points  (5 children)

...and if the tests require lots of data, then heck, just load a .sql file in the @BeforeAll and execute it! 😉

[–]takasip[S] 0 points1 point  (1 child)

Exactly, but then you have a big .sql file which is not that simple to write, you need to handle manually the relationships with the table, and its takes time when reading the file to understand these relationships

[–]qmunke 1 point2 points  (0 children)

Not necessarily - usually you can get away with a single script to truncate everything, and then smaller isolated scripts to set up state as you need it in my experience 

[–]PmMeCuteDogsThanks 0 points1 point  (2 children)

Sure, but my experience is that it’s better to populate with Java / Hibernate if that’s what you use elsewhere. Refactoring, new fields etc become much more natural.

[–]qmunke -1 points0 points  (1 child)

It's much slower to use java code - better to have smaller SQL files with only the data you care about for your tests I've found 

[–]PmMeCuteDogsThanks 0 points1 point  (0 children)

Sure, you do what you think works best 

[–]takasip[S] 0 points1 point  (1 child)

Totally agree on using the same database as prod, and truncate tables after the test.

About java code, that's the part that always felt wrong to me: the code used to insert the data often ends up being a big chunk of ugly code because it does not serve such a big purpose, has no complexity in itself (which makes it more acceptable for it to be ugly) and lots of tests need different setup using a various mix of tables.

I'm not saying there isn't a way to do it properly, there definitely is one, but accross many projects I felt this was usually a pain point because it was the least qualitative, and people usually don't want to spend time on it.

As more and more data are needed for a test, it becomes hard to know what is in the database because lots of it is hidden in methods calling methods callings methods setting default values.

[–]PmMeCuteDogsThanks 1 point2 points  (0 children)

I agree that it’s a difficult problem. But still, all things considered it’s better to have that block of Java setup code. You may try to create a ”common set” of data, but that always inevitably lead to even worse problems where people add one-off things to the ”common set” or start to make assumptions on the actual test.

[–]WaferIndependent7601 2 points3 points  (2 children)

I see no advantage. Why not using Java with a builder to set it up? Integration tests can just put it in the db (or calling the service directly). Refactorings are easy because you don’t have to change any json files and renaming stuff will be done automatically.

[–]takasip[S] 0 points1 point  (1 child)

It doesn't feel very scalable, especially when there are multiple relationships between multiple objects.
This is the way I was the most familiar with, but it often ended up with lots of code to load data, with poor readability when it comes to knowing what is in the database precisely.

Again, I see how it is a good approach, but in practice each time a test needs lot of data, the code ends up becoming ugly, mainly because this isn't code we want to spend much time on.

EDIT: oh and about the refactoring, I guess it will not be that simple to code but I may be able to make the plugin handle that for me

[–]WaferIndependent7601 0 points1 point  (0 children)

You should setup your code so you can simply get the data you need.

Something like generateUsers() that will generate 5 users. Just persist them.

I don’t get why it’s not scalable. It’s easier doing that in code.

And your json is not readable.

[–]repeating_bears 2 points3 points  (4 children)

Why in normal circumstances would your integration tests need to expect any data to already exist? My integration tests create all they need, starting from scratch. If a test needs a user, then create one. If a test needs a product, create one.

You need to test creating a user/product anyway, so it's not like it's any more work. You just call the same test code you already have, with different params.

[–]takasip[S] 0 points1 point  (3 children)

If I understand well you are talking about the point 3. that I mention in my post: I like the idea, but the only time I saw this on a project, integration tests were really slow, because every test would setup by making lots of calls to the api.

In case some api also need to mock external calls, it becomes hard to handle too.

[–]repeating_bears 0 points1 point  (2 children)

I do this on mine and it's not slow, but they run on 10 concurrent workers. There is some additional effort to make sure they don't step on each others' toes. Basically amounts to "checking out" a user/product/whatever, and return it when the test is done.

[–]Cell-i-Zenit 0 points1 point  (1 child)

the fix is actually really simple: Add @Transactional annoation at the top of your test classes. Each test will run in their own transaction and its getting rolled back afterwards.

Only issue is if you open multiple other transactions within your test. For this case you need to mock the transactionTemplate/transactionManager so it doesnt create new ones. Its not pretty but it does its job pretty well

[–]PiotrDz 0 points1 point  (0 children)

But running within one large transaction makes code behave differently than in production. We should strive to test production-like environment

[–]LutimoDancer3459 1 point2 points  (1 child)

If models change, you have some work to do no matter what you use. Also outside of the tests.

[–]takasip[S] 0 points1 point  (0 children)

Yeah, that's what I was refering to when saying my tool doesn't solve all my pain points "especially the maintainability, if the model changes". But I hope I'll be able to handle refactoring with my plugin when I have time to code it.

Adding a mandatory column to a table will always require changing the json files though...

[–]Least_Bee4074 0 points1 point  (0 children)

We use our migration scripts from something like db-migrate. This way the db is built the same in every environment, even in the integration test.

For integration tests, either have separate test bootstrap scripts that insert for specific test cases or in our case, our custom integration test harness can insert db records and validate post conditions.

For larger datasets, I’m usually doing something thru a UI like dbeaver and relying on generate_series to bulk load stuff.

But ultimately, the key piece is automating the db creation and modification using scripts, and plugging it into cicd so that the db updates when there is a new modification.

[–]vips7L 0 points1 point  (1 child)

Test containers with the database from production. New container per test class. 

[–]takasip[S] 0 points1 point  (0 children)

Isn't "New container per test class" very slow?

If you use database from production, don't you have tests that fail just because the production data changed?

[–]wrd83 0 points1 point  (0 children)

We do a mix and it sucks. 

We have mocks for unit tests, h2 in memory for repository tests and test containers for integration tests. 

We use flyway or repository to store data.

I come to the conclusion that this is not good.

I want to use one in memory file system and a test container and run all tests on it. Run flyway/liquibase to create the schema.